Turning Metabolomics into Drug Discovery

Metabolomics is the “omics” that studies the whole metabolome. It has a wide range of applications, inter alia chemotaxonomy, environmental influences, agriculture, etc. Here we review the application of metabolomics in natural product research. The importance of physicochemical properties to drug delivery are discussed in relation to turning metabolomic studies towards drug discovery. We believe that coupling metabolomic studies with standards of known physicochemical properties in order to calibrate the chromatographic columns can be beneficial in identifying compounds of candidate drug quality.


Introduction
Metabolomics is the characterization and quantification of the large number of metabolites that occur in biological systems.Metabolomics approaches are more and more used in quality control, 1 chemical ecology, 2 chemotaxonomy 3 and identification of novel natural products. 4Focusing on problems such as the characterization of primary and secondary metabolites in order to establish a link between the constituents of an organism and its environment (geographical origin, genetic, climates or seasonal changes, etc.) requires complete metabolite fingerprints to detect any differences between the samples and generate hypotheses to explain these differences.
Primary and secondary metabolites refer to small molecules (MW < 1000 Da) 5 made by living organisms via interaction with biosynthetic enzymes. 6Primary metabolites are essential for the survival of the organism, and they include amino acids, carbohydrates and lipids, on the other hand, the secondary metabolites are non-essential for the organism's life but necessary for its survival, and are used as defenses or to communicate with their environment.They include: terpenes, alkaloids, polyketides, hormones and polyphenols. 5etabolomics is also used for dereplication of microorganisms to identify productions of novel compounds. 7Multivariate statistical analyses such as principal components analysis (PCA) are used in metabolomics to simplify multi-parametric data into a correlation pattern.PCA is an unsupervised method that can show a correlation between different samples and highlight either differences or similarities between datasets.In addition to PCA, which compares the variance between different samples, partial least squares (PLS)-discriminant analysis (DA) and other targeted analytical profiling are used in order to establish a degree of "biological similarity" between samples.Multivariate analysis using statistical tools such as unsupervised principal component analysis, supervised discriminant function analysis and Z-score analysis can be used for pattern recognition which may predict that biological activity results from particular features of compounds. 8arious chromatographic and spectroscopic techniques, mostly hyphenated techniques (liquid chromatography (LC)-mass spectrometry (MS), LC-nuclear magnetic resonance (NMR), etc.) 9 are used for metabolomics but high performance liquid chromatography (HPLC) is still the first choice in metabolomics to study the chemical composition of natural crude extracts and the rapid detection of known compounds. 10n this mini-review article, we discuss the application of metabolomics using chromatographic analysis to study the whole metabolome and how to adapt this information to drug discovery.Vol. 27, No. 8, 2016   Dereplication was used to identify and quantify tocopherol from Brazil oil nut from different geographic locations in Brazil using HPLC, revealing little changes in the amount of tocopherol (between 144.80 and 234.26 mg g -1 ).This study showed also the presence of the two main tocopherols (α-and β-tocopherol) in all of the authentic oil samples with differences in their amounts, however, some commercial Brazil nut oils did not show the presence of tocopherols at all. 12 In order to select superior banana genotypes for breeding, dereplication of 29 samples from different genomic groups were studied to determine the phenolic contents and carotenoid profiles.This study aimed to identify samples with high concentrations in carotenoids, which are known as pro-vitamin A. The HPLC analysis showed an appreciable amount of pro-vitamin A carotenoids in the active germosplasm samples comparing with the main cultivars that are currently marketed. 13Dereplication of natural products has also been used to study marine organisms; the use of LC-photodiode array detector (PDA)-MS analysis of 14 Brazilian sponge specimens of the genus Aplysina, in order to detect bromotyrosinederived metabolites based on their UV absorption (on the basis of the three most typical chromophores known for Verongida dibromityrosine-derived metabolites), 14 did not show a significant difference in the chemical profile of the samples.In order to differentiate between the chemical composition of six different Lippia species, an ultra HPLC (UHPLC) metabolomics analysis combined with extraction of molecular formulae based on timeof-flight (TOF)-MS data together with the use of filters (including log P) were used. 10The fast separation capacity of UHPLC and the high quality of the obtained chemical profiles allowed the discrimination of the samples and gave a precise picture of the chemical relationship.In addition, the authors used hierarchical clustering analyses to provide an efficient approach to discover cluster relationships between the samples. 10Dereplication of 57 leaf extracts from Brazilian Asteraceae species were analyzed by HPLC-MS and subjected to machine learning algorithms in order to determine biomarkers with anti-inflammatory potential.Using a genetic algorithm, 1241 chromatographic peaks out of 6052 were selected according to their anti-inflammatory activities, allowing the determination of 11 biomarkers. 15In order to distinguish coffee genotypes grown in three different regions in Brazil, gas chromatography-single quad (GC-Q)/MS coupled with statistical analysis showed that some of the 44 metabolites identified can be used as chemomarkers for origin and genotype differentiation. 16n order to identify analogues of isopimarane diterpenes in complex crude extracts of Velloziaceae, electrospray ionization tandem MS (ESI-MS/MS) was applied to a series of isolated diterpenes from the same family.The compounds have different number of hydroxyls that were observed in the mass spectrum by the multiple loss of water.The authors suggested that the intensity of protonated and cationized compounds could be used to differentiate these compounds in complex mixtures.The MS/MS fragmentation did not show a difference between the isopimarane diterpenes, but could be used to distinguish them from their isomers. 17Because of the challenges encountered in the identification and characterization of all metabolites during metabolomic studies, a rational and sequential method was investigated, 14 different solvents were evaluated for their extraction efficiency on Jatropha gossypifolia.Design of experiments (DoE) and partial least squares (PLS) were used to correlate the physicochemical properties of the different solvents with the chromatographic profiles.It has been shown that the physicochemical properties of the solvent can influence significantly the extraction capacity. 18argeted and untargeted metabolomics were used in order to discover biomarkers and to search for novel active compounds against neglected diseases in Brazil (Chagas disease, dengue, leishmaniasis, leprosy, malaria, schistosomiasis and tuberculosis). 19It was shown that most of the metabolites identified were amino acids, carbohydrates, lipids, organic acids, nucleosides and fatty acids. 19Green and brown Brazilian propolis were collected from different regions in Brazil, analyzed by GC-MS and analyzed by multivariate analysis in order to evaluate their chemical profiles and identify active compounds.Green and brown propolis showed different chemical compositions: brown propolis was rich in triterpenoids, however, green propolis was rich in sesquiterpenes and steroids. 20

Chemometric Analysis: PCA
Because of the huge amount of data acquired during metabolomic studies, the use of chemometric and statistical analysis is indispensable for data mining and visual interpretation. 21PCA is one of the most widely used multivariate statistical analysis tools used in metabolomics analysis. 16,22It is a reliable and easy tool to compare metabolomics profiling.It has been shown that the use of 1 H NMR metabolomics with PCA analysis can be very effective in the discrimination of samples: (i) this technique allowed the differentiation of 12 Cannabis sativa cultivars and was shown to be very promising for the authentication and quality control of C. sativa; 23 and (ii) the same method allowed the discrimination of 11 Ilex species, and allowed the identification of arbutin, which has not been reported before as a constituent of Ilex species and was found to be a biomarker of 8 of the studied species. 24CA combined with molecular networking and dereplication have been shown to be valuable tools for researchers focusing on microbial extract; molecular networking allows rapid comparison of metabolites profiles from complex crude fermentation extracts, for effective chemical dereplication and discovery of novel compounds. 25,26The influence of bacterial isolation location (from Scottish and Antarctic sediments) on bioactive secondary metabolite production within the marine environment was studied using molecular networks for the first time.This study showed a high degree of biogeographic influence upon secondary metabolite production and comparative metabolomics aided in the metabolite dereplication of over 3500 parent ions from these marine sources and has provided targets for further purification. 25

Metabolomics and Drug Discovery: What Makes a Drug? (Importance of Physicochemical Properties)
Metabolomic studies aim to study the whole metabolome, to identify known compounds or to characterize a class of compounds, sometimes without isolating the metabolites.On the other hand, a drug is a single compound that has to be delivered to the patient and reach a suitable site to treat disease.Progressing from metabolomics analysis to drug discovery requires knowledge of physicochemical properties.A drug should have certain physicochemical properties, defined by Lipinski et al. 27 as a rule-of-five; this rule is a set of four simple physicochemical properties for oral bioavailability: MW ≤ 500 Da, log P ≤ 5, H-bond donors ≤ 5, and H-bond acceptors ≤ 10.Recently, data concerning the cause of attrition, toxicology and safety of 812 oral small-molecule drug candidates from AstraZeneca, Eli Lilly, GlaxoSmithKline (GSK) and Pfizer between 2000-2010 were pooled.A set of physicochemical descriptors was also calculated for those drug candidates in order to establish links between the physicochemical properties and compound attrition.It has been shown that there is a greater attrition for more lipophilic compounds.In addition, the majority of compounds had shown desirable ranges, with 75% of compounds having a MW < 499 Da and 75% having log P < 4.4. 28urning metabolomics to drug discovery might be achieved in a very simple way, our hypothesis is to use a mixture of compounds with known physicochemical properties to calibrate the HPLC column and limit the area of the metabolome to be considered in order to progress down a drug discovery pathway.

Retention Time (t R ) and log P
HPLC as used for metabolomics can be used as a surrogate for log P upon suitable calibration.Prediction of retention parameters have been obtained 29 using chemoinformatic, principally quantitative structureretention relationship (QSRR) studies, to obtain statistical correlation between chromatographic retention time and theoretical properties of the metabolites. 29t R is correlated to the physicochemical parameters of a given compound, specifically to log P. Retention time estimation based on the calculated log P is increasingly used, for example, allowing the differentiation between isomeric flavonoid aglycones that eluted at different retention times and possessed different log P values without the need for isolation. 10owever, for more hydrophobic molecules, GSK showed that measuring the chromatographic hydrophobicity index (CHI) might be more reliable than measuring log P as the standard model in drug discovery, and also, CHI has been shown to be linear, non-solubility dependent, predictable and relevant. 30
The physicochemical properties of a given compound are critical to drug delivery and metabolism.Metabolomics characterizes a wider range of metabolites; using standards with known physicochemical properties (log P or CHI) in order to calibrate the chromatographic columns can be beneficial in identifying compounds of candidate drug quality (Figure 1).