Dataset for the quantitative proteomics analysis of the primary hepatocellular carcinoma with single and multiple lesions

Hepatocellular Carcinoma (HCC) is one of the most common malignant tumor, which is causing the second leading cancer-related death worldwide. The tumor tissues and the adjacent noncancerous tissues obtained from HCC patients with single and multiple lesions were quantified using iTRAQ. A total of 5513 proteins (FDR of 1%) were identified which correspond to roughly 27% of the total liver proteome. And 107 and 330 proteins were dysregulated in HCC tissue with multiple lesions (MC group) and HCC tissue with a single lesion (SC group), compared with their noncancerous tissue (MN and SN group) respectively. Bioinformatics analysis (GO, KEGG and IPA) allowed these data to be organized into distinct categories. The data accompanying the manuscript on this approach (Xing et al., J. Proteomics (2015), http://dx.doi.org/10.1016/j.jprot.2015.08.007[1]) have been deposited to the iProX with identifier IPX00037601.


Specifications table
Subject area Biology More specific subject area

Proteomics on the Hepatocellular Carcinoma
Type of data List of identified proteins as tables (.xls), raw data in website How data was acquired The data was acquired by Liquid chromatography mass spectrometry in tandem (LC-MS/MS).The samples were separated by a Acquity UPLC system (Waters Corporation, Milford, MA) and detected by a Nano-Aquity UPLC system (Waters Corporation, Milford, MA) connected to a quadrupole-Orbitrap mass spectrometer (Q-Exactive) (Thermo Fisher Scientific, Bremen, Germany Filtered and analyzed data are supplied here and raw data have also been deposited to the integrated Proteome resources (iProX) with identifier IPX00037601 (http://www.iprox.org/index).

Value of the data
The proteome of hepatocellular carcinoma with single and multiple lesions analyzed using iTRAQ technology.
A total of 5513 proteins (FDR of 1%) were identified which correspond to roughly 27% of the total liver proteome.
The in-depth proteomics analysis of the HCC tumor tissues with a single and multiple lesions might be useful for further study of the mechanisms. 1. Data, experimental design, materials and methods

Data and experimental design
The data show the lists of proteins identified and quantified in the HCC tumor tissues with single and multiple lesions. The tissues were divided into 4 groups: cancerous tissues from HCC patients with multiple observed lesions (MC group, n¼30); surrounding noncancerous tissues from HCC patients with multiple observed lesions (MN group, n¼30); cancerous tissues from primary HCC patients with a single observed lesion (SC group, n¼ 30); surrounding noncancerous tissues from primary HCC patients with a single observed lesion (SN group, n¼30). The detailed characteristics of the selected HCC patients were listed in Table 1. For each group, every 5 individual samples with equal tissue weight were mixed, and then the proteins were extracted from the mixed samples. And then the samples were labeled with the iTRAQ  None  3  5  Mild  13  12  Moderate  13  12  Severe  1  1   Tumor boundaries  Distinct  18  22  Indistinct  12  8   Differentiation degree  I-II  8  4  II-III  17  22  III-IV  5  4   Vascular tumor thrombosis  No  26  25  Yes  4  5   Tumor encapsulation  No  2  6  Incomplete  14  10  Complete  14  14 8-plex reagent as follows: four groups (MC group, MN group, SC group and SN group) were labeled with 113, 114, 115 and 116 isobaric tag, respectively; and the peptides from the biological repetitions of the above 4 groups were labeled with 117, 118, 119 and 121, respectively. The iTRAQ 8-plex labeling was independently repeated 3 times, defining as A, B and C. So we have 6 repeated protein extracts for each group to minimize the individual differences of the patients.

Materials and methods
Tissue samples, including the cancerous and surrounding noncancerous tissues, were obtained from 30 primary HCC patients with multiple observed lesions and 30 primary HCC patients with a single observed lesion, respectively. All patients have undergone radical surgery at Mengchao Hepatobiliary Hospital of Fujian Medical University from August 2010 to January 2013. The protein from these two type HCC tissues was determined by BCA assay (TransGen Biotech, Beijing, China) following the manufacture's protocol. Afterwards, 100 μg proteins per condition were treated with DTT (8 mM) and iodoacetamide (50 mM) for reduction and alkylation. Afterwards, the proteins were typically digested by sequence-grade modified trypsin (Promega, Madison, WI), and then the resultant peptides mixture was further labeled using chemicals from the iTRAQ reagent kit (AB SCIEX, USA).
The peptide mixture was fractionated by high pH separation using a Acquity UPLC system (Waters Corporation, Milford, MA) connected to a reverse phase column (BEH C18, 1.7 mm, 2.1 Â 50 mm 2 , Waters Corporation, Milford, MA). High pH separation was performed using a linear gradient starting from 5% B to 35% B in 20 min (solution B: 20 mM ammonium formate in 90% ACN, the pH was adjusted to 10.0 with ammonium hydroxide). The column flow rate was maintained at 600 μl/min and column temperature was maintained at room temperature. Finally 40 fractions were collected, and two fractions with the same time interval were pooled together to reduce the fraction numbers, such as 1 and 21, 2 and 22, and so on [2]. Twenty fractions at the end were dried in a vacuum concentrator for further usage.
The fractions were then separated by nano-LC and analyzed by on-line electrospray tandem mass spectrometry. The experiments were performed on a Nano-Aquity UPLC system (Waters Corporation, Milford, MA) connected to a quadrupole-Orbitrap mass spectrometer (Q-Exactive) (Thermo Fisher Scientific, Bremen, Germany) equipped with an online nano-electrospray ion source. 8 μl peptide sample was loaded onto the trap column (Thermo Scientific Acclaim PepMap C18, 100 μm Â 2 cm) with a flow of 10 μl/min, and subsequently separated on the analytical column (Acclaim PepMap C18, 75 μm Â 50 cm) with a linear gradient, from 2% D to 40% D in 135 min (solution D: 0.1% formic acid in      ACN). The Q-Exactive mass spectrometer was operated in the data-dependent mode to switch automatically between MS and MS/MS acquisition. Survey full-scan MS spectra (m/z 350-1200) was acquired with a mass resolution of 70 K, followed by 15 sequential high energy collisional dissociation (HCD) MS/MS scans with a resolution of 17.5 K. In all cases, one microscan was recorded using dynamic exclusion of 30 s.

Data analysis
All the raw files generated by the Q-Exactive instrument were converted into mzXML and MGF files using the ms convert module in Trans-Proteomic Pipeline (TPP 4.6.2). All MGF files were searched using Mascot (Matrix Science, London, UK; version 2.3.0) against a human_database provided by The Universal Protein Resource (http://www.uniprot.org/uniprot, released at 2014-04-10, with 20,264 entries). Using the results from Scaffold_4.3.2, we quantified 5513 proteins in three iTRAQ 8-plex labeling replicates. The complete list of identified proteins in our dataset is shown in Table S1. The detailed characteristics of proteomes of the primary HCC with single and multiple lesions, including Molecular Weight (MW), Isoelectric Point (PI), Hydrophobicity, exponentially modified Protein Abundance Index (emPAI), Quantitative Clustering, Average Coefficient of Variance (CV), quantification results with percentage variability, were included in the list as well. The distribution of unique peptide numbers per protein, MW, PI and hydrophobicity also clearly showed that the overall proteome datasets of the primary HCC with single and multiple lesions had no strong bias (Fig. 1). In this dataset, 107 and 330 proteins were classified as differentially expressed in HCC tumor tissues with single and multiple lesions compared to surrounding noncancerous tissues ( Fig. 2A, B). All of the differentially expressed proteins presented a mean expression fold change of 71.5 (log 2 0.58) or even more with a p value less than 0.05 (paired T-test), meanwhile these proteins should have the same change trends in all six biological replicates. Among these differentially expressed proteins, 71 proteins altered their expression in both HCC types (Fig. 2C). GO annotation analysis showed that these proteins were the major participants in the oxidation reduction process and the cellular metabolic processes (Fig. 2D).

Bioinformatics analysis
The Gene Ontology (GO) annotation and pathway enrichment analysis of all the identified proteins and differentially expressed proteins were implemented using the online tool DAVID (http://david. abcc.ncifcrf.gov/). The quantitative iTRAQ ratios of 36 proteins, which dysregulated in MC group comparing to MN group, but these proteins were not dysregulated in primary HCC with a single lesion, were plotted on a heatmap (Fig. 3A). The names of the dysregulated proteins are listed in Table 2. We further analyzed these protein involved biological process by GO analysis (Fig. 3C). Meanwhile, 142 up-regulated proteins and 117 down-regulated proteins were specifically appeared in HCC with a single lesion group, but not in HCC with multiple lesions group; and the up and down regulated proteins also form clearly distinct clusters in the heatmap (Fig. 3B). The list of protein names is also displayed in Table 3. We further analyzed these protein involved biological process by GO analysis (Fig. 3D). Gene ontology (GO) analysis of the molecular function and cell component of differentially expressed proteins which is only dysregulated in HCC with a single lesion or HCC with multiple lesions are also displayed in Fig. 4.
The biological functions and signaling pathway annotations of the differentially expressed proteins were analyzed by Ingenuity Pathways Analysis (IPA) software (version 7.5), which is based on the Ingenuity Pathways database. The key functions of the differentially expressed proteins involved in the HCC with single and multiple lesions according to IPA analysis are also displayed in Fig. 5. The GO annotations, involved signaling pathways and networks were ranked in term of the enrichment of the differentially expressed proteins.