ProteomeXchange submissions via PRIDE

What is ProteomeXchange? ProteomeXchange (PX) provides a standard way for submitting mass spectrometry [3]-based proteomics [4] data to public-domain repositories. Once you have submitted your data to the PX entry point, it will be automatically disseminated to all other repositories in the consortium [1 [5]]. ProteomeXchange simplifies the submission and distribution of proteomics data. It has a centralised infrastructure that allows:

In addition to the databases, ProteomeCentral [19] is the resource that generates a unique identifier [20] for each ProteomeXchange dataset and, also, it constitutes a registry for all ProteomeXchange submissions (irrespective of the receiving repository). This queryable archive provides the users with an efficient way to identify datasets of interest. For instance, it is a way to monitor the re-use of particular datasets, and give an efficient way to monitor the volume and impact of the ProteomeXchange data exchange.

What is PRIDE?
What is PRIDE?
The PRIDE [21] (PRoteomics IDEntifications) database at the European Bioinformatics Institute (EBI) is a centralised, standards-compliant public database containing MS-based proteomics data. It was originally developed to provide the proteomics community with a public repository for peptide and protein identifications, together with the evidence supporting these identifications [2 [5], [22]3].

ProteomeXchange submissions via PRIDE
Published on EMBL-EBI Train online (https://www.ebi.ac.uk/training/online)  Other files related to one submission: gel images, scripts, etc.

Why do we need PRIDE?
The PRIDE database is one of the main public repositories for proteomics data that have been generated by MS approaches. In addition, PRIDE is leading the ProteomeXchange consortium, accounting for >85% of all the submissions.

ProteomeXchange submissions via PRIDE
Published on EMBL-EBI Train online (https://www.ebi.ac.uk/training/online) Various journals in the field strongly support, and some even mandate, deposition of MS-based proteomics data into a public proteomics repository such as PRIDE (or others from the ProteomeXchange consortium). This enables researchers to understand, and potentially reproduce, experiments described in a particular publication. It also allows you to reanalyse mass spectra data as protein sequence databases and search-engine toolkits improve.
In addition, PRIDE provides proteomics data to other data resources such as UniProt [26]. Figure 3) is an extra component of the consortium pipeline. It generates a unique identifier [20] for each ProteomeXchange data submission and acts as a registry for all ProteomeXchange submissions. This queryable archive provides an efficient way to identify and monitor public datasets of interest (for example, you can monitor the re-use of particular datasets). ProteomeCentral provides an efficient way to monitor the volume of data submitted.

ProteomeXchange submissions via PRIDE
Published on EMBL-EBI Train online (https://www.ebi.ac.uk/training/online) of a PX submission.
-The PX XML messages available in ProteomeCentral allow secondary data resources to evaluate and integrate data.

How to submit data to ProteomeXchange via PRIDE
In order to submit your MS/MS data to ProteomeXchange via PRIDE [10], you should use the ProteomeXchange submission tool [28].
There are two pipelines for submitting to PX depending on the supporting information available for each submission: Complete submission Partial submission The PX submission tool can be used in both cases. For larger datasets, it is possible to use a command line alternative which is based in a faster file transfer system called Aspera Once you have submitted your data, they can be held privately, allowing reviewers and journal editors access if desired. Since PRIDE is a public data repository, datasets will be made public upon the publication of the associated manuscript referring the particular PX identifier [20].
Next we will explore both pipelines.

Complete submissions
You can use the PX submission tool [32] to submit your datasets to the complete submissions pipeline. You will need to provide all the raw data [17], and all applicable related metadata [25] to support your submission. All processed identification result files will need to be converted to mzIdentML [33] (version 1.1) format and the corresponding peak list [24] files must be provided as well.
There are different tools to convert or export files to mzIdentML (Check the previous section). To allow PRIDE to keep your submission secure you will need a PRIDE login. To get a username and password you will need to register [34].

ProteomeXchange submissions via PRIDE
Published on EMBL-EBI Train online (https://www.ebi.ac.uk/training/online) The submission is also manually checked by one of PRIDE's curators. We consider this to be essential to ensure quality control of complete submissions. Please check the ProteomeXchange website [37] for more information.
The PX submission tool creates the appropriate relationships between the different types of files included in your submission (the raw data and the results are mandatory; the others are optional). It will allow you to add extra metadata for your dataset before submitting all the files to PRIDE, thereby producing more contextual datasets (Figure 4).
The complete list of data formats supported by PRIDE can be found in Appendix 2 [38].

Partial submissions
Although the option for Partial Submissions is available, the recommended first option is a Complete Submission (see previous page) as it significantly improves the reusability of your dataset. Partial Submissions should be used in case search results cannot be converted/exported to tmzIdentML files.
For newer methods, like DIA (data-independent acquisition) or MS Imaging [39], Partial Submissions are the only options. This might be also the suitable option for niche techniques and methods, like top-down proteomics [4].
As a result, you will be issued with a PX identifier [20] but not with a DOI. In addition your dataset will not be fully Page 6 of 30

ProteomeXchange submissions via PRIDE
Published on EMBL-EBI Train online (https://www.ebi.ac.uk/training/online) integrated in PRIDE, but will be searchable based on metadata [25], and the corresponding files will be made available on the PRIDE FTP [40] server to download.
The Partial Submission pipeline requires you to have the raw data [17] (mass spectrometer output files [18]), search engine output files [41] (output from search engine or analysis pipeline) and a PRIDE login, so you will need to register [42] ( Figure 5). Other files types can be included optionally (quantification output files, peak list [24] files, scripts, gel images, etc). Your username and password will allow PRIDE to securely process your submission.

Metadata requirements for MS/MS submissions
Proteomics data is substantially enriched when it is accompanied by sufficient metadata. The repositories in PX are quite flexible about how much experimental metadata they will accept. However, there are some minimal requirements. Below we summarise what you must supply when submitting your data to PX. We recommend that you add extensive metadata in the interest of making your dataset as useful as possible in the future. Quantification method (if applicable).

The following details are optional:
Sample annotation: -Cell type. Use the "Cell Type" ontology (CL); -Disease. Use the "Human Disease" ontology (DOID).
Dataset details: (a) Your dataset is part of a bigger project/effort (for instance the Human Proteome Project or 'PRIME-XS'). It is a way to tag your dataset to enable grouping this way; (b) There is already a PubMed ID associated with it (the data has been already published); (c) Your dataset represents a reanalysis of an earlier public PX dataset; (d) There are other related "omics" datasets (for instance transcriptomics, metabolomics data present in other repositories) that can be associated.

Submission to ProteomeXchange via PRIDE using the PX submission tool
The following slideshow provides a step-by-step walk-though of the submission process using the ProteomeXchange submission tool. Use the numbers at the top right of the panel below to move through the steps.

ProteomeXchange submission tool
The PX submission tool is a stand alone desktop application that handles both complete and partial MS/MS proteomics submissions to ProteomeXchange (Figure 6). This tool will guide you through the whole submission process.
To run the tool, choose the download [44] option directly on PRIDE Submission Guidelines [45] page.

Create a PRIDE account
To start your submission, you will be prompted to enter details for a PRIDE user account. If you haven't already

Appendix 1: Proteomics data formats
Proteomics data is available in a variety of formats, the ones used by Pride and ProteomeXchange are defined here: File name File content

Mass spectrometry output files ('Raw' data)
This is the data and metadata generated by be the original profile mode scans or may a processing, such as centroiding, applied.
They may be available as mass spectrome spectra in a standardised format (see below below).
It is important that all the scans generated mzML -successor to the others (developed These data formats can be used to represe data. In addition to the mass spectra, they context to the information.

Processed peak lists
Heavily processed form of mass spectrome files via various (semi-) automatic steps, e.g.: centroiding, d These files are formatted in plain text, with mgf.

Protein/peptide identifications
Proteomics mass spectra can be matched identifications for those spectra. Typically a identified if the score attributed to a peptide priori or a posteriori defined threshold. In th initial identification will consist of a peptide a list of proteins from the identified peptides discernible process with its own input and o overall identification software.

Search engine output files
These files contain the data and metadata search engines) used for performing the ide peptides and proteins. Each search engine The outputs are typically formatted in eithe mzIdentML [69] -provides a common forma from any search engine.
To allow a full representation of the proces in the PX tool, the search engine output file or mzIdentML version 1. All peak lists formats (mgf, dta, ms2, pkl) can be supported but they will not be considered raw data. They will be considered as 'peak list processed files' or simply 'peak'.