Discovery Methodology of Novel Conotoxins from Conus Species

Cone snail venoms provide an ideal resource for neuropharmacological tools and drug candidates discovery, which have become a research hotspot in neuroscience and new drug development. More than 1,000,000 natural peptides are produced by cone snails, but less than 0.1% of the estimated conotoxins has been characterized to date. Hence, the discovery of novel conotoxins from the huge conotoxin resources with high-throughput and sensitive methods becomes a crucial key for the conotoxin-based drug development. In this review, we introduce the discovery methodology of new conotoxins from various Conus species. It focuses on obtaining full N- to C-terminal sequences, regardless of disulfide bond connectivity through crude venom purification, conotoxin precusor gene cloning, venom duct transcriptomics, venom proteomics and multi-omic methods. The protocols, advantages, disadvantages, and developments of different approaches during the last decade are summarized and the promising prospects are discussed as well.


Introduction
Cone snails (Conus) are carnivorous mollusks from the Conidae family ( Figure 1). They live in the tropical oceans around the world and hunt fish (piscivorous), worms (vermivorous), or molluscs (molluscivorous) for food, although they are slow-moving creatures [1]. Cone snails have evolved a full set of specialized envenomation apparatus to release bioactive venoms to compensate their slow movement for fast-moving prey, competitors, or/and predators [2,3]. Cone snail venom peptides are secreted by the epithelial secretory cells in the long and convoluted venom duct [2,4]. The venom is pushed by muscle peristalsis of venom bulb and loaded into the harpoon-like radula tooth for envenomation [5]. Due to the human casualties that are caused by cone snail stings in 1960s [6], these venoms first caught researcher's interest in their toxicity and bioactivity.
In this review, the discovery methodology of novel conotoxins from mollusks Conus species has been summarized, which mainly focuses on obtaining full N-to C-terminal sequences, regardless of disulfide bond connectivity, through crude venom purification, conotoxin precusor gene cloning, venom duct transcriptomics, venom proteomics, and multi-omic method. The protocols, advantages, disadvantages, and developments of different approaches during the last ten years are overviewed and the promising prospects of these methods are discussed.

Diversity of Conotoxins
Conotoxins normally consist of 10 to 40 amino acid residues with 2 to 4 or more disulfide bonds. They are expressed as RNA-encoded precursor proteins, which are processed and transferred into mature peptides in the endoplasmic reticulum (ER) and Golgi apparatus. A typical conopeptide precursor is composed of a highly conserved ER signal region, a pro-region and a greatly variable mature peptide region [52]. Conotoxins can be classified into different gene superfamily categories, according to the similarities between the ER signal sequences [35].
Generally, conotoxin-encoding transcripts produce diverse precursors by hypermutation, fragment insertion/deletion, and mutation-induced premature termination [53]. One precursor can produce far more than one mature peptide because of various posttranslational modifications (PTMs) and variable peptide processing (VPP), which create the exponential diversity of conopetides [53,54]. For example, 20 different conopeptide variants on average for each precursor have been detected and characterized from venom duct transcriptomics of Conus marmoreus [49]. VPP refers to the C-and N-terminal truncations of the precursor by proteolytic cleavage at alternative sites [53,54]. These variations generated by interrupting, deleting, or elongating partial sequences and cysteine frameworks. It produced highly variable mature peptides or isoforms with multiple primary sequences [54].
PTMs are ubiquitous and play a key role in the structure and activity of conotoxins [55]. Many types of PTMs are found in the conotoxin maturation process, such as oxidative folding (disulfide bond formation, the most common PTMs), C-terminal amidation, hydroxylation of proline, valine, and lysine, carboxylation of glutamate, cyclization of N-terminal glutamine, glycosylation, sulfation, bromination, and residue epimerization, etc. [53,55]. These multi-diversification mechanisms, such as transcript variation, VPP and PTMs, explain how thousands of specific conopeptides are produced from such a limited gene precursors in a single Conus specie and reveal the reason for interand intra-specific variability [49,53,56].

Conotoxins Purified from Crude Venom
Conotoxins have been obtained by isolation from crude venom of cone snails since 1970s [57] ( Figure 2). The envenomation apparatus of the snails was dissected first. Only about 10 to 50 µL crude venom could be squeezed from each snail specimen, or the dissected tissues were directly subjected to extraction. Sometimes, tens to hundreds of collected snails were dissected to obtain enough venom for conopeptide isolation. The sampling process is non-renewable. It is a waste of precious resource of cone snails, especially for the rare species.
The purification process almost has remained constant for decades ( Figure 2). Crude venom or dissected tissues are extracted by acetonitrile aqueous solution with 1% TFA. The crude extract is fractionated by Size Exclusion Chromatography (SEC), and then purified by C 18 reverse-phase The purification process almost has remained constant for decades ( Figure 2). Crude venom or dissected tissues are extracted by acetonitrile aqueous solution with 1% TFA. The crude extract is fractionated by Size Exclusion Chromatography (SEC), and then purified by C18 reverse-phase chromatography with gradient acetonitrile/water solution with 0.1% TFA as mobile phase. In order to gain enough target conopeptides for the subsequent characterization, enough crude venom and rigorous purification skills are required. The purified conotoxins are subjected to de novo sequencing through Edman degradation  or MS sequencing [81,[88][89][90] after sequential disulfide bond reduction, hydrosulphonyl alkylation, and enzymolysis, which make sequencing process much easier. PTMs are assigned with the aid of MS techniques [59,65,66,[69][70][71][75][76][77][78]80,83,[89][90][91][92]. The targeted conotoxins are then chemically synthesized through SPPS, following the subsequent oxidative folding. HPLC co-elution of the synthesized peptides and the purified native conotoxins could validate the sequencing results [69,73,75,76,81,83,84,89]. For gene superfamily identification of the native peptides, their precursor genomic genes or cDNAs could be cloned by various PCR methods or identified by venom transcriptome sequencing. According to the signal peptide homology of the precursors, their gene superfamily could be determined and classified [31,62,65,68,74,75,79,82,86]. Conotoxins purified from Conus venom during the last ten years are summarized in Table 2. Fifty conotoxins in total were discovered during the last 10 years and only five in average were found and characterized per year, indicating that conotoxin discovery from crude venom isolation was stagnant. More efficient omic study is developing in full swing in recent years. It is difficult to isolate a novel conotoxin from limited amount of crude venom that consists of more than 1000 venom peptides. Blind search policy always makes the native peptide isolation process time-consuming and laborious. Therefore, only limited random conotoxins were discovered, which belonged to a few gene superfamilies (Table 2). Hence, more effective bioassay-guided and MS-sequence-tag guided fractionation methods are under development to facilitate the rapid discovery of novel native conotoxins from different crude venoms [62,86,87]. Conotoxins purified from Conus venom during the last ten years are summarized in Table 2. Fifty conotoxins in total were discovered during the last 10 years and only five in average were found and characterized per year, indicating that conotoxin discovery from crude venom isolation was stagnant. More efficient omic study is developing in full swing in recent years. It is difficult to isolate a novel conotoxin from limited amount of crude venom that consists of more than 1000 venom peptides. Blind search policy always makes the native peptide isolation process time-consuming and laborious. Therefore, only limited random conotoxins were discovered, which belonged to a few gene superfamilies (Table 2). Hence, more effective bioassay-guided and MS-sequence-tag guided fractionation methods are under development to facilitate the rapid discovery of novel native conotoxins from different crude venoms [62,86,87].

Gene Cloning to Discover New Conotoxins
To overcome the limitations of crude venom purification strategy, gene cloning for novel conotoxins discovery has emerged in 1990s [94]. Since a conotoxin is expressed by a specific gene, it can be amplified by PCR technique with specific primers [53,[95][96][97][98].
Generally, genomic DNA is extracted from snail tissue of an individual specimen, or cDNA is prepared by reversed transcription of mRNA extracted from dissected venom duct. The resulting total DNA or cDNA is served as a template for PCR amplification with forward and reverse primers to perform 3 -and 5 -RACE. The primers are designed and synthesized on the basis of the conserved sequence in signal region ( Figure 3, primer 1) or untranslated region of 3 -or 5 -UTRs (Figure 3, primer 2 and 3) of specific known conotoxin precursor, or its relatively conserved introns (Figure 3, primer 4). The PCR products are purified by electrophoresis on agarose gel and are ligated into a plasmid vector for sequencing. The annotation of possible conotoxin-encoding genes is conducted based on homologous searching. The resulting conotoxin sequences are analyzed and assigned by CLUSTALX [99]. The signal region sequences can be predicted by SignalP 3.0 server (http://www.cbs. dtu.dk/services/SignalP/) [99][100][101].

Gene Cloning to Discover New Conotoxins
To overcome the limitations of crude venom purification strategy, gene cloning for novel conotoxins discovery has emerged in 1990s [94]. Since a conotoxin is expressed by a specific gene, it can be amplified by PCR technique with specific primers [53,[95][96][97][98].
Generally, genomic DNA is extracted from snail tissue of an individual specimen, or cDNA is prepared by reversed transcription of mRNA extracted from dissected venom duct. The resulting total DNA or cDNA is served as a template for PCR amplification with forward and reverse primers to perform 3′-and 5′-RACE. The primers are designed and synthesized on the basis of the conserved sequence in signal region ( Figure 3, primer 1) or untranslated region of 3′-or 5′-UTRs (Figure 3, primer 2 and 3) of specific known conotoxin precursor, or its relatively conserved introns (Figure 3, primer 4). The PCR products are purified by electrophoresis on agarose gel and are ligated into a plasmid vector for sequencing. The annotation of possible conotoxin-encoding genes is conducted based on homologous searching. The resulting conotoxin sequences are analyzed and assigned by CLUSTALX [99]. The signal region sequences can be predicted by SignalP 3.0 server (http://www.cbs. dtu.dk/services/SignalP/) [99][100][101]. Primers make it possible for specific conotoxin-encoding genes to be amplified from the total genomic DNA or RNA of a cone snail. Thus the PCR primer design is a key factor for conotoxin discovery by gene cloning. Generally, the resulting PCR sequence of a conotoxin precursor gene generated by primer 1 or primer 2 pairing with primer 3, contains a complete open reading frame (ORF) sequence, which includes a signal region, a pro-region, and a mature peptide region ( Figure  3). When primer 4 pairing with primer 3 is used to clone a conotoxin precursor gene, it starts with partial pro-region without signal peptide. Representative α-family (α*-) conotoxins discovered by gene cloning during the last ten years are shown in Table 3. The resulting sequences of Pu14.1 and GeXIVA consist of complete precursor sequences including signal regions which facilitate to assign gene superfamily category. Previous study showed that the sequences of the α-conotoxin intron in pro-region is highly conserved [97]. Many new α-conotoxins have been discovered by PCR technique using its conserved intron and 3′-UTR primers in our lab, such as α-conotoxin TxIB, TxID, LvIA, etc., which do not contain signal regions (Table 3). A forward primer and its paired reverse primer could be designed according to the conserved intron of a known gene superfamily and its 3′-UTR, such as A-, O-, or other superfamily, to clone novel conotoxin precursor genes. Random cDNA sequencing can also obtain the complete precursor sequence, e.g., VxXXIVA, but this method is not as targeted as the strategy with delicately designed primers.
When compared with crude venom purification, the gene cloning strategy is more resourcesaving. Generally, several or even one specimen is enough for conotoxin gene cloning. However, the mature peptide sequences are speculated from their precursor genes, so no PTMs identification is involved. On the other hand, gene cloning strategy is relatively low-throughput, when compared with the transcriptomic approach that arose in 2010s. In addition, the primers for gene cloning are designed according to the conserved sequences of known family or superfamily, so new family or superfamily conotoxins are difficult to be discovered by this way. Primers make it possible for specific conotoxin-encoding genes to be amplified from the total genomic DNA or RNA of a cone snail. Thus the PCR primer design is a key factor for conotoxin discovery by gene cloning. Generally, the resulting PCR sequence of a conotoxin precursor gene generated by primer 1 or primer 2 pairing with primer 3, contains a complete open reading frame (ORF) sequence, which includes a signal region, a pro-region, and a mature peptide region (Figure 3). When primer 4 pairing with primer 3 is used to clone a conotoxin precursor gene, it starts with partial pro-region without signal peptide. Representative α-family (α*-) conotoxins discovered by gene cloning during the last ten years are shown in Table 3. The resulting sequences of Pu14.1 and GeXIVA consist of complete precursor sequences including signal regions which facilitate to assign gene superfamily category. Previous study showed that the sequences of the α-conotoxin intron in pro-region is highly conserved [97]. Many new α-conotoxins have been discovered by PCR technique using its conserved intron and 3 -UTR primers in our lab, such as α-conotoxin TxIB, TxID, LvIA, etc., which do not contain signal regions (Table 3). A forward primer and its paired reverse primer could be designed according to the conserved intron of a known gene superfamily and its 3 -UTR, such as A-, O-, or other superfamily, to clone novel conotoxin precursor genes. Random cDNA sequencing can also obtain the complete precursor sequence, e.g., VxXXIVA, but this method is not as targeted as the strategy with delicately designed primers.
When compared with crude venom purification, the gene cloning strategy is more resource-saving. Generally, several or even one specimen is enough for conotoxin gene cloning. However, the mature peptide sequences are speculated from their precursor genes, so no PTMs identification is involved. On the other hand, gene cloning strategy is relatively low-throughput, when compared with the transcriptomic approach that arose in 2010s. In addition, the primers for gene cloning are designed according to the conserved sequences of known family or superfamily, so new family or superfamily conotoxins are difficult to be discovered by this way. The signal region is shadowed. The pro-region is italics. The mature conotoxin sequence is underlined. # represents C-terminal amidation. "r" indicates rat.

Cone Snail Multi-Omics
Although big efforts have been made for novel conotoxin discovery from natural crude venom and gene cloning, most of the total estimated conotoxins have not been characterized yet [108]. More efficient, resource-saving, and high-throughput methodology urgently needs to be exploited. "Omics" such as transcriptomics and proteomics, and "Multi-omics" by integrating them together, have opened a new era for conotoxin discovery and rapidly accelerate the rate of conotoxin discovery [108][109][110].

Transcriptomics-A Useful Pathway to Identify Putative Conotoxins
Transcriptomics aims to identify and profile the holistic gene (including the conotoxin-encoding genes) transcription and expression at RNA level. Venom duct is an ideal material for transcriptomic analysis, because the number and level of conotoxin-encoding transcripts from venom duct are much larger than those from other tissues [50,111]. Conus venom duct transcriptomics is able to describe the conotoxin expression and it has presented a useful method to rapidly identify putative conopeptide sequences. In addition, transcriptomics using next generation sequencing (NGS) technology [112] makes large scale sequencing time-and cost-effective.
The transcriptomic pipeline ( Figure 4) starts from the total RNA extraction of dissected venom duct. Then, mRNAs are served as reverse-transcriptional templates for cDNA library construction. PCR amplification is conducted while using cDNA as template and specific sequences as primers.  Table 4). The raw reads generated from NGS platforms require data assembly to remove artifacts, poor quality raw reads, as well as redundant and aberrant sequences [114]. The trimmed sequences are then deciphered into peptide primary sequences according to opening reading frames (ORFs) [112] by ConoPrec [1,53,54,111,115,116] or SignalP4.0 [115][116][117][118], which may locate the signal peptides and predict their cleavage sites. Profile Hidden Markov Models (pHMMs) [1,111,118,119] is a useful tool of ConoSorter [1,110,118,119], which could identify the putative precursors of conopeptides and categorize their superfamilies. Homology search and analysis by running BLAST against the combined searchable online databases, like ConoServer (The university of Queensland, Brisbane, Australia) (http://research1t.imb.uq.edu.au/conoserver/) [120,121], UniProtKB/Swiss-Prot (http: //www.uniprot.org/downloads) [122,123], and NCBI (http://www.ncbi.nlm.nih.gov/), may enable the rapid identification of known and novel conotoxins. ConoSorter also facilitates to illustrating relative sequence frequency, length, number of cysteines, N-terminal hydrophobicity, and sequence similarity score [118]. Thus, a unique transcriptomic dataset for an individual specimen from a specific Conus specie might be established.  [120,121], UniProtKB/Swiss-Prot (http://www.uniprot.org/downloads) [122,123], and NCBI (http://www.ncbi.nlm.nih.gov/), may enable the rapid identification of known and novel conotoxins. ConoSorter also facilitates to illustrating relative sequence frequency, length, number of cysteines, N-terminal hydrophobicity, and sequence similarity score [118]. Thus, a unique transcriptomic dataset for an individual specimen from a specific Conus specie might be established. Compared with traditional isolation and gene cloning, venom duct transcriptomic approach is a rapid, efficient, resource-saving, and high-throughput way to identify massive conotoxins from different cone snails, which greatly extends our cognition of conotoxin resource (Table 4). During the last decade, many putative conopeptide precursors have been identified from transcriptome of different Conus species (Table 4). At least 30 conopeptides precursors were discovered from C. bullatus by transcriptome sequencing. Surprisingly, as many as 3305 novel conopeptide precursors were discovered from a single Conus episcopatus specimen by sequencing its transcriptome (Table 4).
Phylogeny-based conotoxin discovery utilizes the known conserved sequence to design specific primers for PCR amplification, which enables to find more conopeptides belonging to known superfamilies from different Conus species [124]. Additionally, specific PCR primers might be designed according to incomplete sequences that were obtained by MS-sequencing-tag or Edman degradation [124,125], which is also applied to clone new conotoxin precursors from venom transcriptome, cDNA, and its genomic DNA of various Conus species. It provides a feasible way to explore novel conotoxins belonging to new superfamilies [124]. cDNA library normalization is an effective and commonly-used method to equalize some specific cDNA, which facilitates to identify conotoxin genes with a relatively low level expression level [114,124]. Normalization suppresses highly abundant transcript reads and increases rare transcripts, so as to maximize the identified number of unique conotoxins [119].
Thanks to transcriptomic study, venom insulins, which target the heterospecific insulin receptors of prey, predators, and competitors, have been proven to be expressed in many worm-and snail-hunting cone snails [126]. Six insecticidal conotoxins have been validated and screened out from Compared with traditional isolation and gene cloning, venom duct transcriptomic approach is a rapid, efficient, resource-saving, and high-throughput way to identify massive conotoxins from different cone snails, which greatly extends our cognition of conotoxin resource (Table 4). During the last decade, many putative conopeptide precursors have been identified from transcriptome of different Conus species (Table 4). At least 30 conopeptides precursors were discovered from C. bullatus by transcriptome sequencing. Surprisingly, as many as 3305 novel conopeptide precursors were discovered from a single Conus episcopatus specimen by sequencing its transcriptome (Table 4).
Phylogeny-based conotoxin discovery utilizes the known conserved sequence to design specific primers for PCR amplification, which enables to find more conopeptides belonging to known superfamilies from different Conus species [124]. Additionally, specific PCR primers might be designed according to incomplete sequences that were obtained by MS-sequencing-tag or Edman degradation [124,125], which is also applied to clone new conotoxin precursors from venom transcriptome, cDNA, and its genomic DNA of various Conus species. It provides a feasible way to explore novel conotoxins belonging to new superfamilies [124]. cDNA library normalization is an effective and commonly-used method to equalize some specific cDNA, which facilitates to identify conotoxin genes with a relatively low level expression level [114,124]. Normalization suppresses highly abundant transcript reads and increases rare transcripts, so as to maximize the identified number of unique conotoxins [119].
Thanks to transcriptomic study, venom insulins, which target the heterospecific insulin receptors of prey, predators, and competitors, have been proven to be expressed in many worm-and snail-hunting cone snails [126]. Six insecticidal conotoxins have been validated and screened out from the transcriptomic dataset of 215 precursors by homologous search with α-conotoxin ImI [127]. These findings reveal that Conus transcriptomic database can promote the extension for new knowledge and find various new conopeptides. Dash (-) means undetermined.

Proteomics-An Effective Approach to Discovery Natural Conotoxins
Traditional proteomic identification depends on Edman degradation and amino acid composition analysis to assign the peptide sequences, but its sample-consuming and low-throughput characters make it difficult to be extensively applied. As the high-resolution MS instrument appears [135], venom proteomic study with the aid of modern MS technology has proven to be an effective and high-throughput approach for novel conotoxin discovery [108,109].
The general proteomic procedure is presented in Figure 4. Briefly, the venom sample is prepared by squeezing the venom from dissected venom duct (one-off operation), or collecting the secreted venom that is induced by pray from living cone snail individuals (reproducible operation) [1]. As the MS techniques develop, the required venom amount for experimental analysis of proteomics is decreased. Even about 7% or less of crude venom from one specimen is enough [136]. The proteomic data detected from different Conus species, especially for those cone snails hunting different preys, are quite different from each other, because different species and the food preference are the key factors for the evolution of venom diversity [137].
In traditional bottom-up proteomics, pretreatment of venom sample, such as reduction, alkylation, and enzymatic digestion, is carried out before HPLC-MS analysis in order to eliminate the influence of disulfide bonds, although it leads to partial loss of conopeptides during processing [50,53,111,115,138]. In top-down proteomic approach, intact disulfide-bridged venom peptides are remained, which makes it more applicable to analyze simple peptide mixtures, like highly purified venom subfractions, whereas bottom-up approach is more suitable for complex crude venom [50,53,139]. The combination of top-down and bottom-up approach enables the identification of several unexpected cleavage sites during conotoxin maturation [53].
The resulting vast MS data generated from LC-MS analysis are subjected to bioinformatic tools for further data processing and mining. Raw MS data are inputted into Mascot for Peptide mass fingerprint. ProteinPilot™ [1,49,54,115] is used for sequence identification and the annotation of precursor ions by searching the MS/MS mass list obtained at a relatively high precise level [54]. Parameters for enzymolysis and various types of PTMs are imported into ProteinPilot to identify PTMs and fragment splicing. ConoMass [49,115] and ProteinPilot are able to identify nearly all the PTMs except glycosylation, which requires assignment by de novo sequencing. The sequences are homologically searched and matched against databases, such as ConoServer, UniProtKB/Swiss-Prot, NCBI, and known transcriptomic dataset from its own, to identify the known and novel conopeptides as well as their gene superfamilies. The subsequent results are presented by various peptide sequences with a series of statistical data to profile the venom components.
Advanced mass analyzer, like TOF, especially Quadrupole-TOF (Q-TOF), shows rapid acquisition, high resolution, first-class sensitivity, and excellent mass accuracy. Ionization methods, such as ESI, MALDI, CID, ETD, EThcD, etc., provide options for obtaining alternative mass data for different purpose. ESI and MALDI are generally for proteomic study, whereas CID, ETD, EThcD are commonly for de novo MS sequencing by providing different dissociation patterns to acquire variable specific peptide fragments. More mature peptides can be detected by using superior MS instrument with advanced mass analyzer and efficient ionization technique. For instance, from venom proteomics of Conus marmoreus, there were 2710 peptide sequences revealed by MALDI-TOF; 3172 peptide sequences were detected by ESI-Q-TOF with regular electrospray; and 6254 peptide sequences were disclosed using ESI-Q-TOF, which is equipped with a DuoSpray ionization source [49]. ETD ionization strategy combined with targeted chemical derivatization has been applied to increase the charge state of conopeptides so as to maximize the detectable mass range, because the molecular masses of conotoxins usually exceed the optimum detective coverage [132]. Superior mass analyzer and various ionization methods are combined and applied to expand the boundary of accessible venom repertoire. Modern venom proteomics provides a methodology, not only for the rapid detection and characterization of specific conotoxins, but also for profiling an overview of the complex venom components.

Bioinformaics-An Efficient Tool for Massive Data Processing and Integrating
Bioinformatics is an efficient tool for massive data processing and integrating, which has been deeply penetrative during the raw data processing, sequence identification, and superfamily classification by exquisite analytical softwares and algorithms with the introduction of integrated databases [108,140]. Venom duct transcriptomics and venom proteomics both benefit from the emergence and development of bioinformatics, especially the improvements on bioinformatic softwares, algorithms and expansion of searchable databases. The functions of the frequently-used tools for transcriptomics and proteomics are presented in Table 5. Transcriptomic and proteomic studies are quite reliant on the foundation database, which provides templates for sequence searching, matching, and annotating. The databases for sequence identification and BLAST should be the latest updated version, which should be composed of complete or partial natural precursor and mature toxin sequences generated either from conotoxin genes, transcripts, or proteins, as well as artificially synthesized conotoxins. Discovery of novel sequences using different approaches, in return, expands the capacity of the databases.

Multi-Omics Integration
A comprehensive strategy, named "multi-omics" or "venomics" [108,140], by integrating transcriptomics with proteomics through bioinformatics, is popular in the field of conotoxin research [110]. Although venom duct transcriptomics and venom proteomics both have proven to be effective and high-throughput methods to identify massive conopeptide sequences, the conotoxin sequences that are generated from transcriptomics are putative precursor peptides that need to be further confirmed for their real existence at protein level. Furthermore, no PTMs could be predicted from the precursor sequences. Luckily, venom proteomics is able to validate the putative peptides at protein level (Table 4) and identify nearly all the PTMs [49,128]. The validated sequencing and PTMs data can help to illustrate the processing mechanisms (transcript variation, VPP, PTMs) from precursor peptides (transcriptomic data) to the corresponding mature peptides (proteomic data).
Just as not every putative precursor can be validated by proteomic data, not all peptide sequences from proteomic data can find their corresponding precursors (Table 4). In fact, they were overlapped and matched with a very small percentage of 9.98% for Conus episcopatus [50]. The significant variations between the datasets of transcriptomics and proteomics actually exist, and the overlapped data (hit sequences) are not big enough. How to extend the datasets for making access to the completed repertoire of conotoxins? How to decode the variation so as to expand the overlapped or matched precursors with their corresponding mature peptides? Since one precursor can generate various mature conopeptides by the different PTMs. Theoretically, more mature peptides should be detected by proteomics. In practical use, the detected number always varies greatly with different MS instrument, bioinformatic analytical methods, fractionation, and sample pretreatment processes, etc. Rare transcripts at a low translational level are difficult to be recognized, which also contribute to the disparity. Thus, standardized processing protocols, reliable detection methods, dedicated integrated databases, and robust data analysis tools are needed.

Conclusions and Prospects
In this review, we introduced the discovery methodology of novel conotoxins from various Conus species. It focused on obtaining full N-to C-terminal sequences, regardless of disulfide connectivity through crude venom purification, conotoxin precursor gene cloning, venom duct transcriptomics, venom proteomics, and multi-omic methods. The protocols, advantages, disadvantages, and developments of different approaches during the last decade and the promising prospects are summarized and discussed. To overcome the limitations of crude venom purification strategy, gene cloning technique have been developed and it temporarily slows down the deprivation of the native cone snail resource.
In order to improve efficiency, high-throughput omic and multi-omic strategies have opened a new era for conotoxin discovery. Transcriptomics and proteomics are now acknowledged to be effective, resource-saving, and high-throughput approaches for novel conotoxin discovery. Multi-omic strategy is more efficient than using transcriptomics or proteomics alone. Efforts should be made to decode and reduce the variations between transcriptomic and proteomic data in order to expand the accessible repertoire of known conotoxins. The precursor processing mechanisms need to be illustrated as well. Thus, standardized processing protocols, reliable detection methods, dedicated integrated databases, and robust data analysis tools for transcriptomic, proteomic, and multi-omic studies are required to speed up novel conotoxin discovery.

Conflicts of Interest:
The authors declare no conflict of interest.