Skip to main content
Log in

UPLC–MS retention time prediction: a machine learning approach to metabolite identification in untargeted profiling

  • Original Article
  • Published:
Metabolomics Aims and scope Submit manuscript

Abstract

Metabolic profiling focuses on the analysis of a wide range of small endogenous molecules in order to understand the response of a living system to perturbations. Ultra high performance liquid chromatography–mass spectrometry is a widely employed profiling tool, but its application is limited by difficulties in identification of detected metabolites. Herein, we demonstrate how the prediction of retention time can help resolve this major issue. We describe a general approach that enables the generation of reliable quantitative structure retention relationship models tailored to specific chromatographic protocols. This methodology, applied to 442 experimentally characterised standards, employs a combination of random forest and support vector regression models with molecular interaction descriptors. In this unusual application, the Volsurf + molecular descriptors demonstrated a high ability to describe chromatographic retention. On external validation sets, and for a wide range of chemical classes, predicted values were in average within 13 % of the experimentally observed retention time. More importantly, the presented procedure reduced by more than 80 % the number of false putative identification, greatly improving metabolite identification. Furthermore, in 95 % of cases, the correct identification was promoted within the top three metabolite suggestions. This retention time prediction framework can be replicated by different laboratories to suit their profiling platforms and enhance the value of standard library by providing a new tool for compound identification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Almeida, T. M. G., Leitão, A., Montanari, M. L. C., & Montanari, C. A. (2005). The molecular retention mechanism in reversed-phase liquid chromatography of meso-ionic compounds by quantitative structure-retention relationships (QSRR). Chemistry & Biodiversity, 2(12), 1691–1700.

    Article  CAS  Google Scholar 

  • Baczek, T., & Kaliszan, R. (2009). Predictions of peptides’ retention times in reversed-phase liquid chromatography as a new supportive tool to improve protein identification in proteomics. Proteomics, 9(4), 835–847.

    Article  PubMed  CAS  Google Scholar 

  • Beckonert, O., Keun, H. C., Ebbels, T. M. D., Bundy, J., Holmes, E., Lindon, J. C., & Nicholson, J. K. (2007). Metabolic profiling, metabolomic and metabonomic procedures for NMR spectroscopy of urine, plasma, serum and tissue extracts. Nature Protocols, 2(11), 2692–2703.

    Article  PubMed  CAS  Google Scholar 

  • Berthold, M. R., Cebron, N., Dill, F., Gabriel, T. R., Kötter, T., Meinl, T., et al. (2009). KNIME—the Konstanz information miner: Version 2.0 and beyond. SIGKDD Explorations, 11(1), 26–31.

    Article  Google Scholar 

  • Boswell, P. G., Schellenberg, J. R., Carr, P. W., Cohen, J. D., & Hegeman, A. D. (2011a). Easy and accurate high-performance liquid chromatography retention prediction with different gradients, flow rates, and instruments by back-calculation of gradient and flow rate profiles. Journal of Chromatography A, 1218(38), 6742–6749.

    Article  PubMed  CAS  Google Scholar 

  • Boswell, P. G., Schellenberg, J. R., Carr, P. W., Cohen, J. D., & Hegeman, A. D. (2011b). A study on retention “projection” as a supplementary means for compound identification by liquid chromatography-mass spectrometry capable of predicting retention with different gradients, flow rates, and instruments. Journal of Chromatography A, 1218(38), 6732–6741.

    Article  PubMed  CAS  Google Scholar 

  • Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5–32.

    Article  Google Scholar 

  • Chang, C., & Lin, C. (2011). LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(3), 1–27.

    Article  Google Scholar 

  • Creek, D. J., Jankevics, A., Breitling, R., Watson, D. G., Barrett, M. P., & Burgess, K. E. V. (2011). Toward global metabolomics analysis with hydrophilic interaction liquid chromatography–mass spectroscopy: Improved metabolite identification by retention time prediction. Analytical Chemistry, 83, 8703–8710.

    Article  PubMed  CAS  Google Scholar 

  • Cruciani, G., Mannhold, R., Berellini, G., Carosati, E., & Benedetti, P. (2006). Chapter 8. Use of MIF-based VolSurf descriptors in physicochemical and pharmacokinetic studies. In G. Cruciani (Ed.), Molecular interaction fields: Applications in drug discovery and ADME prediction (pp. 171–196). Weinheim: Wiley.

    Google Scholar 

  • De Vos, R. C. H., Moco, S., Lommen, A., Keurentjes, J. J. B., Bino, R. J., & Hall, R. D. (2007). Untargeted large-scale plant metabolomics using liquid chromatography coupled to mass spectrometry. Nature Protocols, 2(4), 778–791.

    Article  PubMed  Google Scholar 

  • Dimitrov, S., Dimitrova, G., Pavlov, T., Dimitrova, N., Patlewicz, G., Niemela, J., & Mekenyan, O. (2005). A stepwise approach for defining the applicability domain of SAR and QSAR models. Journal of Chemical Information and Modeling, 45(4), 839–849.

    Article  PubMed  CAS  Google Scholar 

  • Dunn, W. B., Broadhurst, D. I., Atherton, H. J., Goodacre, R., & Griffin, J. L. (2011). Systems level studies of mammalian metabolomes: The roles of mass spectrometry and nuclear magnetic resonance spectroscopy. Chemical Society Reviews, 40(1), 387–426.

    Article  PubMed  CAS  Google Scholar 

  • Ermondi, G., & Caron, G. (2012). Molecular interaction fields based descriptors to interpret and compare chromatographic indexes. Journal of Chromatography A, 1252, 84–89.

    Article  PubMed  CAS  Google Scholar 

  • Fiehn, O. (2002). Metabolomics—the link between genotypes and phenotypes. Plant Molecular Biology, 48(1–2), 155–171.

    Article  PubMed  CAS  Google Scholar 

  • Ghasemi, J., & Saaidpour, S. (2009). QSRR prediction of the chromatographic retention behavior of painkiller drugs. Journal of Chromatographic Science, 47(2), 156–163.

    Article  PubMed  CAS  Google Scholar 

  • Golbraikh, A., Shen, M., Xiao, Z., Xiao, Y.-D., Lee, K.-H., & Tropsha, A. (2003). Rational selection of training and test sets for the development of validated QSAR models. Journal of Computer-Aided Molecular Design, 17(2–4), 241–253.

    Article  PubMed  CAS  Google Scholar 

  • Golbraikh, A., & Tropsha, A. (2002). Beware of q2! Journal of Molecular Graphics and Modelling, 20(4), 269–276.

    Article  PubMed  CAS  Google Scholar 

  • Gramatica, P., Cassani, S., Roy, P. P., Kovarich, S., Yap, C. W., & Papa, E. (2012). QSAR modeling is not “push a button and find a correlation”: A case study of toxicity of (benzo-)triazoles on algae. Molecular Information, 31(11–12), 817–835.

    Article  CAS  Google Scholar 

  • Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA data mining software: An update. SIGKDD Exploration, 11(1), 10–18.

    Article  Google Scholar 

  • Héberger, K. (2007). Quantitative structure-(chromatographic) retention relationships. Journal of Chromatography A, 1158(1–2), 273–305.

    Article  PubMed  Google Scholar 

  • Hu, R.-J., Liu, H.-X., Zhang, R.-S., Xue, C.-X., Yao, X.-J., Liu, M.-C., & Fan, B.-T. (2005). QSPR prediction of GC retention indices for nitrogen-containing polycyclic aromatic compounds from heuristically computed molecular descriptors. Talanta, 68(1), 31–39.

    Article  PubMed  CAS  Google Scholar 

  • Jalali-Heravi, M., & Kyani, A. (2004). Use of computer-assisted methods for the modeling of the retention time of a variety of volatile organic compounds: a PCA-MLR-ANN approach. Journal of Chemical Information and Computer Sciences, 44(4), 1328–1335.

    PubMed  CAS  Google Scholar 

  • Kaliszan, R. (2007). QSRR: Quantitative structure-(chromatographic) retention relationships. Chemical Reviews, 107(7), 3212–3246.

    Article  PubMed  CAS  Google Scholar 

  • Kind, T., & Fiehn, O. (2010). Advances in structure elucidation of small molecules using mass spectrometry. Bioanalytical Reviews, 2(1–4), 23–60.

    Article  PubMed  PubMed Central  Google Scholar 

  • Kohonen, T. (1990). The self-organizing map. Proceedings of the IEEE, 78(9), 1464–1480.

    Article  Google Scholar 

  • Lee, H. S., Kim, M. K., Lee, C., Kim, J., Choo, I. H., Woo, J. I., & Chong, Y. (2008). Chemometric studies on brain-uptake of PET agents via VolSurf analysis. Bulletin-Korean Chemical Society, 29(1), 61–68. doi:10.5012/bkcs.2008.29.1.061.

    Article  CAS  Google Scholar 

  • Luan, F., Xue, C., Zhang, R., Zhao, C., Liu, M., Hu, Z., & Fan, B. (2005). Prediction of retention time of a variety of volatile organic compounds based on the heuristic method and support vector machine. Analytica Chimica Acta, 537(1–2), 101–110.

    Article  CAS  Google Scholar 

  • Mihaleva, V. V., Verhoeven, H. A., de Vos, R. C. H., Hall, R. D., & van Ham, R. C. H. J. (2009). Automated procedure for candidate compound selection in GC-MS metabolomics based on prediction of Kovats retention index. Bioinformatics, 25(6), 787–794.

    Article  PubMed  CAS  Google Scholar 

  • Nicholson, J. K., Lindon, J. C., & Holmes, E. (1999). “Metabonomics”: Understanding the metabolic responses of living systems to pathophysiological stimuli via multivariate statistical analysis of biological NMR spectroscopic data. Xenobiotica, 29(11), 1181–1189.

    Article  PubMed  CAS  Google Scholar 

  • Nobeli, I., & Thornton, J. M. (2006). A bioinformatician’s view of the metabolome. BioEssays, 28(5), 534–545.

    Article  PubMed  CAS  Google Scholar 

  • Nord, L. (1998). Prediction of liquid chromatographic retention times of steroids by three-dimensional structure descriptors and partial least squares modeling. Chemometrics and Intelligent Laborary Systems, 44(1–2), 257–269.

    Article  CAS  Google Scholar 

  • Perruccio, F., Mason, J. S., Sciabola, S., & Baroni, M. (2006). Chapter 4. FLAP: 4-Point pharmacophore fingerprints from GRID. In G. Cruciani (Ed.), Molecular interaction fields: Applications in drug discovery and ADME prediction (pp. 83–102). Weinheim: Wiley.

    Chapter  Google Scholar 

  • Put, R., & Vander Heyden, Y. (2007). Review on modelling aspects in reversed-phase liquid chromatographic quantitative structure-retention relationships. Analytica Chimica Acta, 602(2), 164–172.

    Article  PubMed  CAS  Google Scholar 

  • Roberts, L. D., Souza, A. L., Gerszten, R. E., & Clish, C. B. (2012). Targeted metabolomics. In F. M. Ausubel (Ed), Current protocols in molecular biology (Chapter 30, Unit 30.2.1–24.)

  • Sahigara, F., Ballabio, D., Todeschini, R., & Consonni, V. (2013). Defining a novel k-nearest neighbours approach to assess the applicability domain of a QSAR model for reliable predictions. Journal of Cheminformatics, 5(1), 27–36.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  • Sahigara, F., Mansouri, K., Ballabio, D., Mauri, A., Consonni, V., & Todeschini, R. (2012). Comparison of different approaches to define the applicability domain of QSAR models. Molecules, 17(5), 4791–4810.

    Article  PubMed  CAS  Google Scholar 

  • Shinoda, K., Sugimoto, M., Yachie, N., Sugiyama, N., Masuda, T., Robert, M., & Tomita, M. (2006). Prediction of liquid chromatographic retention times of peptides generated by protease digestion of the Escherichia coli proteome using artificial neural networks. Journal of Proteome Research, 5(12), 3312–3317.

    Article  PubMed  CAS  Google Scholar 

  • Stein, S. E., Heller, S. R., & Tchekhovskoi, D. (2003). An open standard for chemical structure representation: The IUPAC chemical identifier. In Proceedings of the 2003 International Chemical Information Conference (Nimes), Infonortics (pp. 131–143).

  • Subirats, X., Rosés, M., & Bosch, E. (2007). On the effect of organic solvent composition on the pH of buffered HPLC mobile phases and the pKa of analytes—a review. Separation & Purification Reviews, 36(3), 231–255.

    Article  CAS  Google Scholar 

  • Sugimoto, M., Hirayama, A., Robert, M., Abe, S., Soga, T., & Tomita, M. (2010). Prediction of metabolite identity from accurate mass, migration time prediction and isotopic pattern information in CE-TOFMS data. Electrophoresis, 31(14), 2311–2318.

    Article  PubMed  CAS  Google Scholar 

  • Tropsha, A. (2010). Best practices for QSAR model development, validation, and exploitation. Molecular Information, 29(6–7), 476–488.

    Article  CAS  Google Scholar 

  • Tropsha, A., Gramatica, P., & Gombar, V. K. (2003). The importance of being earnest: Validation is the absolute essential for successful application and interpretation of QSPR models. QSAR & Combinatorial Science, 22(1), 69–77.

    Article  CAS  Google Scholar 

  • Volsurf + 1.0.6 manual. Molecular Discovery, UK.

  • Want, E. J., Wilson, I. D., Gika, H., Theodoridis, G., Plumb, R. S., Shockcor, J., & Nicholson, J. K. (2010). Global metabolic profiling procedures for urine using UPLC-MS. Nature Protocals, 5(6), 1005–1018.

    Article  CAS  Google Scholar 

  • Wishart, D. S., Knox, C., Guo, A. C., Eisner, R., Young, N., Gautam, B., et al. (2009). HMDB: A knowledgebase for the human metabolome. Nucleic Acids Research, 37((Database issue)), D603–D610.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  • Worth, A., Bassan, A., Gallegos, A., Netzeva, T., Patlewicz, G., Pavan, M. et al. (2005). The characterisation of (Quantitative) Structure-Activity Relationships: Preliminary guidance. In ECB Report EUR 21866: European Commission, Joint Research Center (p. 95)

  • Zamora, I., Oprea, T., Cruciani, G., Pastor, M., & Ungell, A.-L. (2003). Surface descriptors for protein-ligand affinity prediction. Journal of Medicinal Chemistry, 46(1), 25–33.

    Article  PubMed  CAS  Google Scholar 

Download references

Acknowledgments

The authors would like to thank Dr. Bernard Walther (Director Center of Excellence in PK) and Dr Claire Boursier-Neyret (Head of non-clinical PK) for their support during this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Philippe Vayer.

Ethics declarations

Human and animal informed consent

This article does not contain any studies with human participants or animals performed by any of the authors.

Conflict of interest

All authors declare that they have no conflict of interest.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 151 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wolfer, A.M., Lozano, S., Umbdenstock, T. et al. UPLC–MS retention time prediction: a machine learning approach to metabolite identification in untargeted profiling. Metabolomics 12, 8 (2016). https://doi.org/10.1007/s11306-015-0888-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11306-015-0888-2

Keywords

Navigation