The Repository Chemotion: Infrastructure for Sustainable Research in Chemistry**

The repository Chemotion provides solutions for current challenges to store research data in a feasible manner. A main advantage of Chemotion is the comprehensive functionality, offering options to collect, prepare, and reuse data with discipline‐specific methods and data‐processing tools.


Availability and requirements
Project homepage for ELN and repository: eln.chemotion.net Web access to the repository installation at KIT: https://www.chemotion-repository.net/welcome Videos and explanations on youtube channel of chemotion (ELN and repository): https://www.youtube.com/channel/UCWBwk4ZSXwmDzFo_ZieBcAw Operating system(s): platform independent access, developed/tested on Linux and Mac, deployed on Linux.
Other requirements for users: Modern internet browser supporting HTML5 and JavaScript.

Methods
The Chemotion Repository software is built based on the software Chemotion ELN. The software is a web application having the back-end server built on the Ruby on Rails framework 1 with PostgreSQL relational database, and the front-end user interface is mainly constructed with the ReactJS framework 2 to serve single page applications. The repository code is regularly updated with the latest developments of the Chemotion projects on GitHub by git rebasing. Features that are non-essential to the repository functions are simply disabled and hidden from the end-user. The current code is available through Zenodo. 3 The data from a Chemotion ELN instance is transferred to the repository using https, with the data serialized as a JSON object, while the analyses files are uploaded as MIME multipart attachments. The transfer request is authorized and authenticated using a JSON Web Token that the end-user previously fetched from its repository account and registered with its ELN account. The IUPAC identifiers that are available through the repository for each entry are generated using an Openbabel implementation of the InChI software (v1.05). RInChI is generated using Rinchi-gem 4 a ruby binding gem of the InChI core software (v1.05) and the reaction InChI/ RInChI (v.1.00). 5 To register the publication DOI metadata with DataCite via the Metadata Store (MDS) API, the Metadata Schema Version 4.3. DataCite e.V. 6 is used. The metadata of the publication can be accessed by machines either by using DataCite DOI services or directly from the repository service using the protocol for metadata harvesting from the Open Archive Initiative (OAI-PMH). 7 For a registration of samples in the database PubChem, publication samples are registered as PubChem substances using the PubChem upload services. 8, 9 Technically, molecule structures are sent as molfiles via ftp. The molfiles are supplemented with a unique and repository specific identifier. Upon review and acceptance from PubChem they are assigned a PubChem substance ID (SID). Background jobs in the repository regularly query the PubChem substance and compound databank through the PubChem REST API to fetch the correspondingly assigned SID for the sample and compound ID (CID) for the molecule. Figure S1. View of a typical review panel that summarizes the submitted data for the reviewers. The data can be checked and commented directly or the reviewer can access the workspace for a detailed view and data check.

Description of the API and methods to transfer data to the repository
A user can transfer programmatically sample data to its Chemotion repository account. For this, one needs to retrieve an access token by visiting: https://www.chemotionrepository.net/pages/tokens Figure S3. Adding the required URL to generate the token at the repository.
In place of the URL field, one has to enter the origin from where the transfer request will be run. The access token can then be extracted from the url of the redirected page.
For an example data can be transferred using a curl command: Attachment files can also be added by adding more parts to the mulipart request and with having the uuid attachment identifier as key. (For an example with the previous data.json, one will add -F f49f252c-dd7f-4e0c-9fef-a6f043d0d431=@… to the curl request options.)

Information on entities that occur in more than one version
The repository's processes register different versions of the same entity "molecule". Entries such as samples are considered to belong to the same parent molecule, if the InChIKey is the same. New submissions referring to the same parent are considered as new versions and the new samples including their analytical data are referenced with DOIs that contain a numeric version indicator. The version indicator is separated from the InChIKey descriptor in the DOI name by a dot. Example: