Dataset of target mass spectromic proteome profiling for human chromosome 18

Proteome profiling is a type of quantitative analysis that reveals level of protein expression in the sample. Proteome profiling by using selected reaction monitoring is an approach for the Chromosome-centric Human Proteome Project (C-HPP). Here we describe dataset generated in the course of the pilot phase of Russian part of C-HPP, which was focused on human Chr 18 proteins. Proteome profiling was performed using stable isotope-labeled standards (SRM/SIS) for plasma, liver tissue and HepG2 cells. Dataset includes both positive and negative results of protein detection. These data were partly discussed in recent publications, “Chromosome 18 Transcriptome Profiling and Targeted Proteome Mapping in Depleted Plasma, Liver Tissue and HepG2 Cells” [1] and “Chromosome 18 transcriptoproteome of liver tissue and HepG2 Cells and targeted proteome mapping in depleted plasma: Update 2013” [2], supporting the accompanying publication “State of the Chromosome 18-centric HPP in 2016: Transcriptome and Proteome Profiling of Liver Tissue and HepG2 Cells” [3], and are deposited at the ProteomeXchange via the PASSEL repository with the dataset identifier PASSEL: PASS00697 for liver and HepG2 cell line.


Value of the data
This data characterizes the diversity of chromosome 18 protein species in liver tissue and HepG2 cell line using SRM.
This data could be of interest to laboratories studying protein reference levels and cross-tissue biological variability of proteome.
This data could be useful for protein, peptide and transition selection for SRM-assay development. Dataset may be used as a test for automated SRM-data processing algorithms.

Data
This dataset describes conditions of liver tissue and HepG2 cell line proteome profiling. Targeted protein list included 268 proteins of chromosome 18. Data were automatically processed to quantify proteins in the biosample. Dataset includes raw data, transition list, skyline files and sample preparation instructions, available in PASSEL, 2 figures and Supplementary table with protein copy numbers in liver tissue and HepG2 cell line.

Sample preparation
The trypsin digestion of liver tissue and cell lysates was performed as described in Ponomarenko et al. [2].

Peptide synthesis
The peptides were produced using the SOLiD-phase peptide synthesis on the Overture (Protein Technologies, USA) or Hamilton Microlab STAR devices according to the method published in Hood et al. [4] . The isotope-labeled leucine (Fmoc-Leu-OH-13C6,15N) was used for isotope-labeled peptide synthesis instead of the unlabeled leucine (Fmoc-Leu-OH) [5].

Transition list
List of peptides for 268 chromosome 18 proteins was generated manually using data about occurrence of proteotypic peptides from proteomic repositories GPMdb, ProteinAtlas and PRIDE and MaRiMba-criteria (protocol was described earlier in Supplementary note 2, Zgoda et al. [1]). For each protein one "best-flyer" peptide was chosen. For each peptide 3 the most intensive transitions [6] were chosen using previous research results.
All 268 peptides were distributed over 3 SRM-assays (A-C) in equal parts according to their calculated retention time to avoid interference.
LC-SRM Analysis was held as described earlier in Supplementary note 2, Zgoda et al. [1]. Each SRM experiment was repeated in 3 technical runs. Each transition peak was characterized with the following variables: retention time, peak height, SIS/endogenous peak area ratio. No manual inspection for to find transitions that were similar to those in the target peptides or to reveal detected peptides was held.

How to use data
Dataset is represented by several file types (Fig. 1). For re-using of the dataset and extraction relevant information from it one can install freely-available and open source Windows client application Skyline [7] (for each biosample there is one skyline file, including transitions and technical runs info). It provides an opportunity to open raw data, visualize (Fig. 2) and analyze SRM data. Besides it is possible to use proprietary software (Agilent MassHunter Workstation Software) for data visualization. For example, raw file name X18HPG_C-r002.d (Fig. 2). Skyline files also are named using these keys.

Quantification
Calibration curves were obtained for each of the desired peptides using the mixtures of purified synthetic non-labeled peptides in the concentration range of 100-100 fmole/ml and its isotopically labeled standards (SIS) were added at the concentration of 2 fmole/ml. All calibration curves were linear in the range of 100-0.1 fmole/ml and showed the coefficient of linear regression equal to 0.95.
Prior to the sample processing, the performance of the LC-SRM platforms used was validated by obtaining the calibration curves of the corresponding set of SIS and synthetic non-labeled peptides. Moreover, after five LC-SRM runs we verified the relevance of calibration by analyzing one of the calibration peptide solution at 10 fmole/ml.
The detection limit was defined as the lowest concentration determined on the linear part of calibration curve. It varies for different peptides in the range from 100 amole/ml to 200 amole/ml. Labeled (SIS)/target peptide peak area ratios were used to calculate the concentration of the targeted peptide in a sample. Peak area ratios were obtained using Skyline software.
C pept ¼C lab *S pept /S lab where C pepttarget peptide concentration, C lablabeled peptide (SIS) concentration (see Quantification), S peptarea of target peptide peak, and S labarea of labeled peptide peak.
All calculated target peptide copy numbers for liver tissue and HepG2 are listed in the Supplementary Table 1.