Genome-wide detection of terpene synthase genes in holy basil (Ocimum sanctum L.)

Holy basil (Ocimum sanctum L.) and sweet basil (Ocimum basilicum L.) are the most commonly grown basil species in India for essential oil production and biosynthesis of potentially volatile and non-volatile phytomolecules with commercial significance. The aroma, flavor and pharmaceutical value of Ocimum species is a significance of its essential oil, which contains most of the monoterpenes and sesquiterpenes. A large number of plants have been studied for characterization and identification of terpene synthase genes, involved in terpenoids biosynthesis. The goal of this study is to discover and identify the putative functional terpene synthase genes in O. sanctum. HMMER search was performed by using a set of 13 well sequenced and annotated plant genomes including the newly sequenced genome of O. sanctum with Pfam-A database locally, using HMMER 3.0 hmmsearch for the two Pfam domains (PF01397 and PF03936). Using this search method 81 putative terpene synthases genes (OsaTPS) were identified in O. sanctum; the study further reveals 47 OsaTPS were putatively functional genes, 19 partial OsaTPS, and 15 OsaTPS as probably pseudogenes. All these identified OsaTPS genes were compared with other plant species, and phylogenetic analysis reveals the subfamily classification of OsaTPS in TPS-a, -b, -c, -e, -f and TPS-g subfamilies clusters. This genome-wide identification of OsaTPS genes, their phylogenetic analysis and secondary metabolite pathway mapping predictions together provide a comprehensive understanding of the TPS gene family in Ocimum sanctum and offer opportunities for the characterization and functional validation of numbers of terpene synthase genes.


Introduction
The terpenes are a large class of plant specialized secondary metabolites, which derived from 5 carbon (C 5 ) isoprenoid unit; these precursors are produced by two biosynthetic pathways, the methylerythritol phosphate pathway (MEP) in the chloroplast and the classical mevalonate pathway (MVA). MVA is also known as the isoprenoid pathway or HMG-CoA reductase pathway present in the cytosol [1,2], the pathway produces five-carbon building blocks of terpenoids called isopentenyl pyrophosphate (IPP) and dimethylallyl pyrophosphate (DMAPP) [3,4], they are used to make isoprenoids. IPP and DMAPP combined by prenyltransferases PLOS  from approximately 20 to 150 in sequenced plant genomes, but with only one exception of the moss Physcomitrella patens, which possess single functional TPS genes shown in Table 1.
Ocimum sanctum (Holy basil) due to its medicinal properties has great importance in Indian traditional medicine and Ayurveda. It is a sweet-scented, pubescent herb nearly 3 to 100cm in height, growing abundantly in tropical, sub-tropical and warm temperate regions. Ocimum sanctum plays a significant role in herbal as well as modern medicines. Many studies have focused on the medicinal prospect of this plant in the form of crude drugs, essential oil or pure compounds. O. sanctum shows antidiabetic, antimicrobial, antioxidant, anti-inflammatory, antinociceptive, antifertility, anticancer, anthelmintic and cardioprotective properties [22,23]. Several Ocimum species produce essential oil with methyl eugenol being a major constituent and reported to have anti-cancer properties [24,25]. Additionally, Ocimum terpenoids also used as biodegradable herbicide due to their phytotoxic activity [26]. The Lamiaceae family is the sixth most extensive family of the flowering plants and plays a significant role in the economy of natural products and used in foods, pharmaceuticals, flavor and perfumery industries. This family shows a remarkably high degree of secondary metabolite diversity specially terpenoids [27]. Because of their importance a number of TPSs genes which were responsible for terpenoids biosynthesis have been explored and characterized in many species of Lamiaceae family, these species were closely related to Ocimum sanctum, including Rosmarinus officinalis, Mentha piperita [28][29][30] [40], Salvia pomifera [41], Salvia sclarea [42,43] and Salvia divimorum [44]. Although in prior studies at least 44 monoterpenes and 39 sesquiterpenes were reported in literature reports [27,45].
The volatile constituents of various Ocimum sanctum tissues such as leaves, flowers, and seeds have been previously examined, and constitute mostly monoterpenes, sesquiterpenes and diterpenes [46], but least number of Terpene synthases genes were characterized in Ocimum species. Some characterized TPSs genes of Ocimum are (Geraniol synthase [47], R-linalool synthase, (-)-Endo-fenchol synthase, Selinene synthase, Terpinolene synthase, β-myrcene synthase, α-Zingiberene synthase, γ-Cadinene synthase, and Germacrene-D synthase, all monoterpene and sesquiterpene synthases) [48,49] and one triterpene oxidosqualene synthases [50]. Due to unavailability of whole genome sequencing data, an actual number of TPS genes in Ocimum sanctum is still unknown. In this study, we have identified 47 putative functional terpene synthase genes in O. sanctum and classified the TPSs into subfamily. The comparative sequence study with closely related species reveals and suggest the possible biological role of these predicted O. sanctum terpene synthases. These results may provide insight for further characterizing the putatively functional terpene synthase genes in the Holy basil.

Data retrieval and identification of TPSs in O. sanctum
The 13 plant's genome and proteome data were downloaded from (https://www.ncbi.nlm.nih. gov/genome/) and (https://www.ncbi.nlm.nih.gov/assembly/). The genome data of Ocimum sanctum was taken from the in-house sequenced genome repository. Two terpene synthase specific domains Pfam N-terminal (PF01397) (https://pfam.xfam.org/family/PF01397#tabview=tab6) and Pfam C-terminal domains (PF03936) (https://pfam.xfam.org/family/PF03936#tabview=tab6) were downloaded from the Pfam database [51]. Standalone tool, HMMER version 3.1b2 was downloaded [52] and used to search the Ocimum sanctum predicted proteome data including 13 downloaded proteomes using the PF03936 and PF01397 domains model data as a query for proteome; the significant (e-value <10 −3 ) was set for identification of candidate TPS genes. The candidate genes were also inspected with a ScanProsite tool (https://prosite.expasy.org/scanprosite/) for terpene synthase motifs identification. FGENESH gene prediction tool (www.softberry.com) was also used for initial annotation of the predicted Ocimum sanctum terpene synthase (OsaTPS) genes. Predicted OsaTPS were then monitored for the full-length gene, to make them full-length the upstream and downstream areas of the conserved scaffold region were screened, and reverse BLASTx search was used to confirm the identity of the putative functional TPS gene. Out of 90 identified OsaTPS genes, 9 were excluded due to short sequence or stop codons in the translated gene, among the 81 selected OsaTPS genes, only 47 were taken for further analysis, rest 19 partial OsaTPS and 15 OsaTPS probably pseudogenes were retained separately for study shown in S1 and S2 Tables. Some of the genes, which do not cover the full-length gene sequence were also included in this study. The Gene structure intron-exon and motif representation of putative OsaTPS genes were determined by the Gene structure display server 2.0 version [53].

OsaTPS sequence annotation and secondary metabolite pathway genes prediction
The Blast search of OsaTPS genes was performed against UniProt database, and GO (Gene Ontology) terms were assigned for each unigene based on the GO term annotated to its corresponding homolog in the UniProt database.

In-silico identification of putative terpene synthase genes in O. sanctum
In the study 81, putative TPS genes in Ocimum sanctum genome were identified using a high sequence similarity screening search for HMM PF01397 and PF03936 TPS domain models including 13 other plant species genomes. Out of 81 putative OsaTPS genes, 19 OsaTPS were predicted as partial genes and 15 probably pseudogenes. The OsaTPS genes which are partial and having multiple frameshifts or stop codons were not considered in this study. The 34 partial and pseudogenes separately analyzed for terpene synthase subfamily classification, where out of these, 15 partial genes falls in TPS-a, and 3 in TPS-b subfamily group and 1 partial gene falls in TPS-e subfamily, rest 8 pseudogenes genes fall in TPS-a subfamily, 3 in TPS-b, 2 in TPS-c, 1 in TPS-e and 1 gene in TPS-f subfamily group. All the partial and pseudogenes were retained for further study as shown in S2 Table. Maximum pseudogenes were found among the TPS-a subfamily, after all, manual curation and sequence filtration, only 47 putative functional OsaTPS genes were considered for further analysis Table 2.

Intron-exon structure and organization of OsaTPS genes
In this study, two forms of TPS gene classification was represented, one according to the presence of number of introns and exons in the TPS genes and another according to the presence of conserved motif. The intron-exon classification shows three classes, class I, class II and class III terpene synthases [59,63].  f, and five OsaTPS fall in TPS-g subfamily, these all subfamilies of terpene synthases are of class I TPSs. While five OsaTPS falls with in TPS-c subfamily which is of class II terpene synthases.

Phylogenetic analysis of terpene synthase gene in Ocimum sanctum
The  (Fig 3). The phylogenetic analysis of OsaTPS gene formed only six clades as; (i) TPS-a (Fig 4) (ii) TPS-b and -g ( Fig 5) (iii) TPS-c, -e and-f (Fig 6). TPS-d is missing from this analysis as it is gymnosperm specific and TPS-h is specific to the spikemoss Selaginella moellendorffii [10].
In TPS-a subfamily clade, O. sanctum TPS genes formed orthologous pairs with closely related species Sorghum bicolor and Solanum lycopersicum and speciation clusters with distantly related species as depicted in (Fig 4 and S2 Fig). In the tree, it was observed that Arabidopsis thaliana and Sorghum bicolor have a longer branch length than Ocimum sanctum and Solanum lycopersicum, which suggest a prolonged period of gene differentiation without gene duplication events. TPS-a group consists of all the known angiosperm sesquiterpene and diterpene synthase proteins, along with predicted OsaTPS genes, it is, therefore, most likely that 18 OsaTPS clustered with this group are also members of sesquiterpene and diterpene synthases.
TPS-b and -g phylogenetic tree clusters, shows 16 and 5 members of OsaTPSs, respectively, these clades consist of known angiosperm monoterpene or isoprenes synthase genes ( Fig 5  and S3 Fig). The subfamily TPS-b and TPS-g were shown in a combined state with the representation of separate clusters; it was observed that all 5 OsaTPS-g subfamily genes formed an orthologous pair with S. lycopersicum and O. basilicum genes. TPS-b further divided into two subclades; TPS-b1 and TPS-b2, only Solly TPS3, 4, 5 and 7 lies in the TPS-b1 subclade, these TPS belongs to monoterpene synthase, while TPS-b2 sub-cluster contains multiple species, in which OsaTPS24, 25, 26, 27, 28, 29, 30, 31, 32, 33, and OsaTPS34 formed a separate cluster with longer branch length, where OsaTPS20, 21, 22, and OsaTPS23 form a cluster with S. lycopersicum genes.
The TPS-c, -e, and -f of O. sanctum shared five members of genes with TPS-c, two with TPS-e and one with TPS-f clades. These all subfamilies were shown in a single phylogenetic tree (Fig 6) and represented with the labeling of different TPS subfamily clades as shown in S4    TPS-a, -b, -c, -e, -f and TPS-g. Bootstrap values assigned in the tree, which was higher than 80%, values below 80% were not shown in the figure. https://doi.org/10.1371/journal.pone.0207097.g003

Conserved sequence motif analysis of O. sanctum TPSs
All the discovered OsaTPS were investigated for the presence of terpene synthase conserved motifs, i.e., RR(X) 8 W, RXR, DXDD, DDXXD, and NSE/DTE; except these motifs, 20 more motifs has been explored. Monoterpene and diterpene synthases typically contain an N-terminal plastidial targeting peptide upstream of the conserved or modified RR(x) 8 W motif, this conserved motif was predicted in 31 OsaTPS genes in exact or in modified form, which resemble sesquiterpene, diterpene and monoterpene synthases. In TPS-b clade RR(x) 8 W motif was not found to be conserved in all OsaTPS genes (Fig 7). The most conserved DDXXD motif of the class I TPS gene family was observed in 39 members of OsaTPS, also represented in sequence motif logo (Fig 7). The DDXXD motif is present on the N-terminal region of positional 35 amino acid downstream to RXR/RDR motif; it plays a role in the complexation of the diphosphate group after ionization of substrate. At the c-terminal, a fully conserved RDR motif was found in most of the OsaTPS sequences, with some variations: R(D/H)R, some of  7). Motif DXDD was found only in OsaTPS-c subfamily (Fig 7), with sequence similarity range of 62% to 68% with TPS-c specific ent-copalyl diphosphate synthase, this motif was conserved in class II TPS genes, this class of TPS lack DDXXD motif and have non-active α-domain [66]. Discovered 47 OsaTPS of O. sanctum were annotated with the UniProt protein database to know their sequence similarity to the well-characterized terpene synthase genes available till date Table 2. The classified OsaTPS subfamily substantially mapped with the TPS genes of different species of Lamiaceae family; Ocimum basilicum, Origanum vulgare, Pogostemon cablin, Mentha x piperita, Salvia officinalis, Lavandula angustifolia, and with other species.

Functional annotation and pathway mapping of Ocimum sanctum TPS genes
Gene ontology search was done to search the associated hits by sequence homology for their respective Gene Ontology (GO), Kyto Encyclopaedia of Genes and Genomes (KEGG) and Enzyme commission codes (EC) for each query sequence and the highest bit score hit were selected. Annotation against the GO database yielded significant annotation for 47 OsaTPS gene representing the best hits. All the OsaTPS genes were classified into three major components, the Biological Process (BP), Cellular Component (CC) and Molecular Function (MF). In (BP) all 47 OsaTPS genes were representing their participation in biosynthetic process and lipid metabolism process, whereas 35 OsaTPS among 47 tend to participate in the catabolic process, 28 OsaTPS among 47 participate for another process. OsaTPS distribution in (CC) was represented in three components as 29 OsaTPS shown in the plastid, 6 OsaTPS in the cytosol and 5 OsaTPS in the cytoplasm. In Molecular function (MF) all 47 OsaTPS show lyase activity, where 41 among the 47 shows ion binding activity, and 12 OsaTPS genes show isomerase activity, along with this analysis (Fig 8 and S4 Table).
KEGG pathway mapping against the OsaTPS genes was performed to predict and hypothesize the secondary metabolite biosynthesis pathway. In monoterpenoid biosynthesis pathway, total 15 OsaTPS genes were found, which may be involved in the synthesis of the particular secondary metabolite. Similarly, in the diterpenoid pathway, 8 OsaTPS genes show their presence, whereas the maximum 23 OsaTPS genes were observed in the sesquiterpenoid biosynthesis pathway (Fig 9). OsaTPS19 shows homology with the isoprene synthase gene, which is used to synthesize the isoprene from dimethylallyl diphosphate. All the conserved amino acids were highlighted in the grey shades. 47 OsaTPS were shown and classified into TPS gene subfamily. Here RR(X) 8 W conserved motif is conserved in OsaTPS-b subfamily except for OsaTPS19 (angiosperm monoterpene synthase). Variation of the RR(X) 8 W motif is found in OsaTPS-a subfamily of sesquiterpene and diterpene synthase that have putative N-terminal plastid peptides, OsaTPS-c and-e lack twinarginine residues but consist only tryptophan residue. NSE/DTE motifs were continuously seen conserved except OsaTPS-c subfamily. Motif RXR in all OsaTPS are shown genes either in same or modified form except some OsaTPS. Motif DDXXD and DXDD were shown highly conserved in all putative functional O. sanctum TPS genes, but in two genes, i.e., OsaTPS6 and OsaTPS25 DDXXD motif is absent, which might be due to sequence assembly error. https://doi.org/10.1371/journal.pone.0207097.g007

Discussion
Ocimum genus is a significant genus in the Lamiaceae family; it was considered as an important medicinal plant from centuries and at present also used in flavor and fragrance industries due to the occurrence of some aroma compounds like α-Pinene, camphene, eugenol, limonene, camphor in their essential oil. The Ocimum sanctum is rich in mono and sesquiterpenes, which are essential and interested in their contribution to industries. Alot of research is done in Ocimum species on phenylpropene class for characterization of the responsible genes for synthesis of eugenol, methyl chavicol, chavicol and their derivatives [25]. Being such an essential plant for human life, this species has not been explored much in the context of molecular characterization of terpene synthases genes. In this study, we have used genome data of Ocimum sanctum [67] which is 386 Mb in size along with the plastid genome of 142,245 bp and known to be smallest in Lamiaceae family till date. In this study, the newly sequenced genome of O. sanctum was compared with Salvia miltiorrhiza. The comparative study reveals the two genomes to be most similar and also share identical diploid number of chromosome (2n = 16). It also reveals that O. sanctum genome is almost half the size of S. miltiorrhiza and appears to be relatively compact with quite fewer repeat sequences, while it falls in the identical phylogenetic clade with S. miltiorrhiza.
S. miltiorrhiza and O. sanctum both are rich in phenylpropanoids and their derivatives, which are well known for their therapeutic activities in Chinese and Indian traditional medicine systems. Although in O. sanctum the presence of a large number of terpene synthases homologs emphasizes to explore more specific and medicinally important terpene synthase genes in the plants. From this previous study, we have methodically identified the 47 putatively functional terpene synthases in O. sanctum which were responsible for the biosynthesis of volatile terpenes. These predicted O. sanctum TPSs were divided into seven TPS sub-families according to the standard TPSs subfamily classification methods [6,68]. We now extended the analysis by including newly identified TPS sequences from the sequenced genome of Ocimum sanctum. Out of 47 OsaTPS genes maximum number of genes is the member of TPS-a subfamily from OsaTPS1 to OsaTPS18. This clade is composed explicitly of angiosperm sesquiterpene and diterpene synthases and contains conserved DDxxD motif in the c-terminal domain of TPS which is supposed to combining metal ions, with less conserved RR(X) 8 W motif presents at downstream of the N-terminal transit which confirms the presence of typical sesquiterpene and diterpene synthases. OsaTPS of TPS-a group that lack RR(X) 8 W motif does not have the twin-arginine residue but have conserved tryptophan residues.
In TPS-a, subfamily, members of O. sanctum either formed separate cluster or share clusters with S. lycopersicum (Fig 4 and S2 Fig) and shows orthologous relationship with Class I specific sesquiterpene synthases of TPS-a subfamily genes i.e., cadinene synthase, Germacrene-Dsynthase, Viridiflorene synthase, Selinene synthase, Bi-cyclogermacrene synthase and Cismuuroladiene synthase. The previous subfamily classification suggests that TPS-b contains conserved RR(X) 8 W belongs to monoterpene synthases and synthesizes angiosperm monoterpenoids, while in O. sanctum it was seen that OsaTPS20 to OsaTPS28 and OsaTPS31 genes contain highly conserved RR(X) 8 W motif which confirms the predicted OsaTPS genes to be monoterpene synthases. Other OsaTPS of this class lack conservation of this motif. In TPS-b subfamily, OsaTPS shared common clusters with S. lycopersicum, S. bicolor and with O. basilicum, all O. sanctum TPS genes from OsaTPS19 to OsaTPS34 shows an orthologous relationship with a subfamily of TPS-b terpene synthase genes as represented in S3 Fig. Five OsaTPS genes were observed in TPS-c subfamily from OsaTPS35 to OsaTPS39, these TPS genes contain only DxD(D/V) motifs and lack DDxxD motif, and suggested to be monofunctional gene of class II specifically Copalyl diphosphate related diterpene synthases, these features and functional sequence annotation of OsaTPS confirm the sequences to me Copalyl diphosphate synthases.
The TPS-c, -e and -f subfamilies of OsaTPS were closely related to each other and shared a common cluster with N. tabacum and S. lycopersicum. TPS-e/f contains all valid kaurene synthases proteins from angiosperms which encode and annotate OsaTPS40, the presence of DDxxD motif but not DxDD motif and annotation with kaurene synthase TPS gene confirm it to be a class I TPS gene as mentioned in S4 Fig. The TPS-g is a clade presented closely related to the TPS-b family as shown in (Fig 5) and lack RRx 8 W motif; it is defined and classified as monoterpene synthases which produce cyclic floral aroma compounds like Ocimene and Myrcene which firstly characterized in snap-dragon [64].
Plant secondary metabolism pathway produces a large number of specific compounds. These compounds do not play a direct role in the plant growth and development but do help plants to survive in its environment. These metabolites were synthesized by specialized enzymes know as terpene synthases. In this study, we discovered a large number of terpene synthase genes in O. sanctum genome although from a biological point of view it is not necessary that if the plant contains a large number of TPS genes it will synthesize the higher diversity of terpenoids because single terpene synthase gene can synthesize several terpenes from a single substrate. In a family Lamiaceae, some TPS genes have been characterized till date, so far only 11 TPS gene are reported in Ocimum basilicum of which, nine were characterized [25]. In Lavandula amgustifolia, only 10 TPS genes were found among which only three were characterized. Likewise in Mentha x piperita, only six TPS genes were found, and four were functionally characterized so far, this comparison among closely related species of O. sanctum shows that O. sanctum possesses a large number of putative functional TPS genes.
The chemical composition of O. sanctum essential oil also suggests the presence of a large number of terpenoids in the plant, which were synthesized by specific TPS genes, but few TPS gene studies have been done at molecular level. Therefore, there is a need for characterization of more TPS genes to know their actual involvement in the biosynthesis of Ocimum sanctum secondary metabolites.

Conclusion
In the present study, we have discovered and analyzed the terpene synthase genes families from Ocimum sanctum genome by using TPS gene HMM models, which is used to identify genome-wide terpene synthase genes. The study provides the first comprehensive prediction and annotation of the very large OsaTPS (O. sanctum TPS) gene family about genomic structure, phylogenetic subfamily classification, motif localization and enzyme pathway mapping. Results showed that OsaTPS gene family is one of the largest gene family of specialized secondary metabolism in Ocimum species as there is no prior information about O. sanctum TPS genes. The predictions are exploring and unfolding the most significant gene family, which has been expanded across the genome through gene duplication and functional diversifications. Remarkably, a large number of functionally diverse sesquiterpene, diterpene and monoterpene synthase, which are identified in our study may contribute to enlighten the way for researchers. The phylogenetic analysis of OsaTPS showed similar results in many of the plant species studied so far and lies within all six gymnosperm TPS genes classified subfamilies, i.e., TPS-a, TPS-b, TPS-c, TPS-e, TPS-f, and TPS-g. This study may provide a deeper understanding of the O. sanctum putative functional genes phylogenetic classification and may help the researchers in the characterization of discovered OsaTPS genes to enhance the metabolic process of O. sanctum. This study represents the first report for the identification of O. sanctum TPS genes and their subfamily classification. These findings are initial steps for a better understanding of O. sanctum TPSs genes. Further characterization is required to determine the proper function of enzymes in secondary metabolism.