Skip to content
Licensed Unlicensed Requires Authentication Published by De Gruyter June 9, 2022

UDM (Unified Data Model) for chemical reactions – past, present and future

  • Jarosław Tomczak , Elena Herzog EMAIL logo , Markus Fischer , Juergen Swienty-Busch , Frederik van den Broek ORCID logo , Gabrielle Whittick , Michael Kappler , Brian Jones and Gerd Blanke

Abstract

The UDM (Unified Data Model) is an open, extendable and freely available data format for the exchange of experimental information about compound synthesis and testing. The UDM had been initially developed in a collaborative project between Elsevier and Roche, where chemical reactions data from a variety of disparate data sources existing at Roche was consolidated and integrated into the Roche in-house version of the Reaxys database. Elsevier adapted the UDM model to its needs and finally donated its pre-4.0 release to the Pistoia Alliance for further development together with the five project founders (Elsevier, Roche, BIOVIA, GSK and Novartis, joined later by BMS), who contributed with funding and expertise to the Pistoia Alliance UDM project between 2017 and 2020. The latest UDM version 6.0 has been made freely available for the community under the MIT license in January 2021. The past, present, and future of the UDM exchange format are discussed in this article and factors that contribute to the successful adoption of the UDM format.


Article note:

A collection of invited papers on Cheminformatics: Data and Standards.



Corresponding author: Elena Herzog, Elsevier Information Systems GmbH, Frankfurt am Main, Germany, e-mail:

Funding source: BIOVIA

Award Identifier / Grant number: Unassigned

Funding source: BMS

Funding source: Elsevier

Funding source: GSK

Funding source: Novartis

Funding source: Roche

Funding source: Pistoia Alliance

Acknowledgments

The authors are grateful to the entire UDM team and community for stimulating discussions and their contributions. In particular, to Roman Affentranger from Novartis, who was the original UDM project champion at Roche, the role was taken over by Brian Jones at a later stage. Hans Kraut from InfoChem (currently DeepMatter) made a remarkable contribution by his expertise and a donation of a sample SPRESI dataset that was converted to UDM and included in its release. Similarly, Elsevier donated a small dataset from the Reaxys database. Finally, Becky Upton and Nick Lynch from the Pistoia Alliance facilitated the transfer of the UDM license from the Pistoia owned license to the MIT license and supported the transfer of the project to a broader community.

  1. Author contribution: All the authors have accepted responsibility for the entire content of this submitted manuscript and approved submission.

  2. Research funding: This work was supported by BIOVA, BMS, ELSEVIER, GSK, Novartis, Roche and Pistoia Alliance.

  3. Conflict of interest statement: The authors declare no conflicts of interest regarding this article.

References

[1] F. K. Beilstein. Handbuch der organischen Chemie, Hamburg (1881).Search in Google Scholar

[2] W. T. Wipke, J. D. Dill, S. Peacock, D. Hounshell. Search and retrieval using an automated molecular access system. in Presented at the 182nd National Meeting of the American Chemical Society, New York (1981).Search in Google Scholar

[3] L. Chen, J. G. Nourse, B. D. Christie, B. A. Leland, D. L. Grier. J. Chem. Inf. Comput. Sci. 42, 1296 (2002), https://doi.org/10.1021/ci020023s.Search in Google Scholar PubMed

[4] IUPAC, Graphical Representation Standards for Chemical Reaction Diagrams, https://iupac.org/projects/project-details/?project_nr=2017-036-2-800, (accessed Jun 2, 2022).Search in Google Scholar

[5] World Wide Web Consortium (W3C). W3C XML Schema Definition Language (XSD) 1.1, https://www.w3.org/TR/xmlschema11-1/.Search in Google Scholar

[6] Copyright © 2021 Elsevier Life Sciences IP Limited. Reaxys is a trademark of Elsevier Life Sciences IP Limited, used under license.Search in Google Scholar

[7] A. J. Lawson, J. Swienty-Busch, T. Géoui, D. Evan. ACS (Am. Chem. Soc.) Symp. Ser. 1164 (2014).Search in Google Scholar

[8] F. Agnetti, M. Bensch, H. Biller, M. Blapp, B. Cheikh, G. Blanke, J. Degen, B. Dienon, T. Doerner, G. Doernen, F. Farshchian, W. Gotzeina, P. Hilty, R. Horstmoeller, T. Jeker, B. Jones, M. Kappler, A. Momin, A. Regoli, D. Ribaud, B. Starck, D. Stoffler, K. Weymann, P. Udupa. Intuitive and integrated browsing of reactions, structures, and citations: the roche experience. in 245th National Meeting of the American Chemical Society, New Orleans, LA, April 7–11, 2013.Search in Google Scholar

[9] N. Jung. Documentation and publication of reactions with Chemotion ELN and Repository, NIH Workshop on Reaction Informatics, May 18–20 (2021), https://www.piug.org/PIUG-PF/10518961 (accessed Oct 24, 2021).Search in Google Scholar

[10] ISO 3166 Country Codes available at https://www.iso.org/iso-3166-country-codes.html.Search in Google Scholar

[11] RXNO—reaction ontologies. https://github.com/rsc-ontologies/rxno.Search in Google Scholar

[12] The Allotrope Framework. https://www.allotrope.org/allotrope-framework.Search in Google Scholar

[13] Allotrope Foundation Ontologies (AFO), https://www.allotrope.org/ontologies.Search in Google Scholar

[14] S. R. Heller, A. McNaught, I. Pletnev, S. Stein, D. Tchekhovskoi. J. Cheminf. 7, 23 (2015), https://doi.org/10.1186/s13321-015-0068-4.Search in Google Scholar PubMed PubMed Central

[15] D. Weininger. J. Chem Inf. Comput. Sci. 28, 31 (1988), https://doi.org/10.1021/ci00057a005.Search in Google Scholar

[16] D. Weininger. J. Chem Inf. Comput. Sci. 29, 97 (1988).10.1021/ci00062a008Search in Google Scholar

[17] The CDXML text-based file format, https://www.cambridgesoft.com/services/documentation/sdk/chemdraw/cdx/IntroCDXML.htm.Search in Google Scholar

[18] W. J. Wiswesser. A Line-Formula Chemical Notation, Thomas Crowell Company publishers, New York (1954).Search in Google Scholar

[19] E. G. Smith. The Wiswesser Line-Formula Chemical Notation, McGraw-Hill Book Company Publishers, New York (1968).Search in Google Scholar

[20] Y. Wang, J. Xiao, T. O. Suzek, J. Zhang, J. Wang, S. H. Bryant. Nucleic Acids Res. 37, W623 (2009), https://doi.org/10.1093/nar/gkp456.Search in Google Scholar PubMed PubMed Central

[21] National Institutes of Health (NHI), https://pubchem.ncbi.nlm.nih.gov/.Search in Google Scholar

[22] R. Sayle, N. O’Boyle, G. Landrum, R. Affentranger. Open sourcing a Wiswesser Line Notation (WLN) parser to facilitate electronic lab notebook (ELN) record transfer using the Pistoia Alliance’s UDM (Unified Data Model) standard, poster at BioIT World (2019).Search in Google Scholar

[23] UDM XML Schema Change Log, https://github.com/PistoiaAlliance/UDM/blob/master/Docs/ChangeLog.md.Search in Google Scholar

[24] A. Dalby, J. G. Nourse, W. D. Hounshell, A. K. I. Gushurst, D. L. Grier, B. A. Leland. J. Laufer. J. Chem. Inf. Comput. Sci. 32, 244 (1992), https://doi.org/10.1021/ci00007a012.Search in Google Scholar

[25] The most up-to-date version of description of RDfiles can be requested from, https://discover.3ds.com/ctfile-documentation-request-form.Search in Google Scholar

[26] W. A. Warr. WIREs Computational Molecular Science 1, 557 (2011), https://doi.org/10.1002/wcms.36.Search in Google Scholar

[27] CambridgeSoft CDX File Format, https://www.cambridgesoft.com/services/documentation/sdk/chemdraw/cdx/.Search in Google Scholar

[28] G. Grethe, G. Blanke, H. Kraut, J. M. Goodman. J. Cheminf. 10, 22 (2018), https://doi.org/10.1186/s13321-018-0277-8.Search in Google Scholar PubMed PubMed Central

[29] G. Blanke, G. Grethe, H. Kraut, I. Öri, J. H. Jensen, J. Goodman. The International Chemical Identifier for Reactions, InChI Working Groups Meeting – April 2021.Search in Google Scholar

[30] G. L. Holliday, P. Murray-Rust, H. S. Rzepa. J. Chem. Inf. Model. 46, 145 (2006), https://doi.org/10.1021/ci0502698.Search in Google Scholar PubMed

[31] P. Murray-Rust, H. S. Rzepa. J. Chem. Inf. Comput. Sci. 43, 757 (2003).10.1021/ci0256541Search in Google Scholar PubMed

[32] P. Murray-Rust, H. S. Rzepa. Data Sci. 1, 128 (2002), https://doi.org/10.2481/dsj.1.128.Search in Google Scholar

[33] Open Reaction Database – The Schema, https://docs.open-reaction-database.org/en/latest/schema.html.Search in Google Scholar

[34] Open Reaction Database – Overview, https://docs.open-reaction-database.org/en/latest/overview.html.Search in Google Scholar

[35] Protocol Buffers, https://developers.google.com/protocol-buffers.Search in Google Scholar

[36] JSON (JavaScript Object Notation), https://www.json.org/json-en.html.Search in Google Scholar

[37] XML Schema, https://www.w3.org/2001/XMLSchema.Search in Google Scholar

[38] JSON Schema, https://json-schema.org/.Search in Google Scholar

[39] UDM GitHub repository, https://github.com/PistoiaAlliance/UDM/.Search in Google Scholar

Published Online: 2022-06-09
Published in Print: 2022-06-27

© 2022 IUPAC & De Gruyter. This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. For more information, please visit: http://creativecommons.org/licenses/by-nc-nd/4.0/

Downloaded on 12.5.2024 from https://www.degruyter.com/document/doi/10.1515/pac-2021-3013/html
Scroll to top button