Proteomics dataset: The colon mucosa from inflammatory bowel disease patients, gastrointestinal asymptomic rheumatoid arthritis patients, and controls

The datasets presented in this article are related to the research articles entitled “Neutrophil Extracellular Traps in Ulcerative Colitis: A Proteome Analysis of Intestinal Biopsies” (Bennike et al., 2015 [1]), and “Proteome Analysis of Rheumatoid Arthritis Gut Mucosa” (Bennike et al., 2017 [2]). The colon mucosa represents the main interacting surface of the gut microbiota and the immune system. Studies have found an altered composition of the gut microbiota in rheumatoid arthritis patients (Zhang et al., 2015; Vaahtovuo et al., 2008; Hazenberg et al., 1992) [5], [6], [7] and inflammatory bowel disease patients (Morgan et al., 2012; Abraham and Medzhitov, 2011; Bennike, 2014) [8], [9], [10]. Therefore, we characterized the proteome of colon mucosa biopsies from 10 inflammatory bowel disease ulcerative colitis (UC) patients, 11 gastrointestinal healthy rheumatoid arthritis (RA) patients, and 10 controls. We conducted the sample preparation and liquid chromatography mass spectrometry (LC-MS/MS) analysis of all samples in one batch, enabling label-free comparison between all biopsies. The datasets are made publicly available to enable critical or extended analyses. The proteomics data and search results, have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifiers PXD001608 for ulcerative colitis and control samples, and PXD003082 for rheumatoid arthritis samples.


a b s t r a c t
The datasets presented in this article are related to the research articles entitled "Neutrophil Extracellular Traps in Ulcerative Colitis: A Proteome Analysis of Intestinal Biopsies" (Bennike et al., 2015 [1]), and "Proteome Analysis of Rheumatoid Arthritis Gut Mucosa" (Bennike et al., 2017 [2]). The colon mucosa represents the main interacting surface of the gut microbiota and the immune system. Studies have found an altered composition of the gut microbiota in rheumatoid arthritis patients (Zhang et al., 2015;Vaahtovuo et al., 2008;Hazenberg et al., 1992) [5][6][7] and inflammatory bowel disease patients (Morgan et al., 2012;Abraham and Medzhitov, 2011; [8][9][10]. Therefore, we characterized the proteome of

Subject area
Biology More specific subject area Characterization of the proteome of the colon mucosa of ulcerative colitis patients, gastrointestinal healthy rheumatoid arthritis patients, and controls.

Type of data
Raw-mass spectrometry files and text/excel files How data was acquired Mass Spectrometry Liquid Chromatography Data was acquired using a high-resolution/high-accuracy Q Exactive plus (Thermo Scientific) mass spectrometer.

Data format
Raw-and analyzed data.

Experimental factors
Human colon mucosal biopsies from ulcerative colitis patients, gastrointestinal healthy rheumatoid arthritis patients, and controls.

Experimental features
Biopsies were extracted by colonoscopy and immediately snap-frozen with liquid nitrogen. The biopsies were tryptic digested and analyzed by electrospray ionization liquid chromatography mass spectrometry.

Value of the data
The dataset contains the largest number of identified human proteins from colon mucosa biopsies as of 2017.
The dataset was obtained in one batch, allowing for label-free comparison of the colon mucosa of ulcerative colitis patients, rheumatoid arthritis patients, and controls.
The first dataset of the colon mucosa of gastrointestinal healthy RA patients. The datasets can be analyzed for novel proteome effects of disease and treatments. The datasets allow for extended statistical analysis, and we encourage such collaborations.

Data
The datasets in this article provides information on the proteome of the colon mucosa of inflammatory bowel disease patients with ulcerative colitis [1], gastrointestinal healthy rheumatoid arthritis patients [2], and controls. The study was motivated by the finding of an altered composition of the gut microbiota in rheumatoid arthritis patients [5][6][7] and inflammatory bowel disease patients [8][9][10]. All biopsies were handled on-site by the project group to limit technical variance. The biopsies were randomized, digested using a modified filter-aided sample preparation protein digestion protocol, and analyzed in technical triplicates by high-throughput proteomics on a Q Exactive mass spectrometer. All experimental factors were kept constant, allowing for a label-free quantitative analysis between all samples. The unprocessed proteomics data files (Table 1) and processed search result files (Table 2), have been deposited to the Pro-teomeXchange Consortium via the PRIDE partner repository, with the dataset identifier PXD001608 for ulcerative colitis and control samples, and PXD003082 for rheumatoid arthritis samples [13,14].
A cumulated 6768 proteins (FDR o1%) were identified, representing the largest proteome dataset of the colon mucosa so far. Additionally, the dataset represents the first analysis of the colon mucosa of gastrointestinal healthy rheumatoid arthritis patients. The data-analysis result from the analysis Table 1 Raw-datafiles in PXD001608 and PXD003082. All samples were analyzed in technical triplicates, and all raw-files are named accordingly (e.g. Ctrl_10_3 is the third repeat of control 10). "Poor R" signifies a Pearson's correlation coefficient Ro 0.95 between the technical repeats, and additional data validation is recommended for studies including these datafiles. The number of identified proteins in each replicate is given, wo/w the MaxQuant match between runs feature which transfer MS/MS information between different LC-MS/MS analysis. RA: Rheumatoid arthritis, UC: ulcerative colitis, NA: Not available. with MaxQuant can be downloaded as zipped txt-files, the context of which are described in the tables.pdf also in the zipped file. The result-file proteinGroups.txt, contains all identified proteins at o1% FDR, and information regarding each protein, e.g. the corresponding label-free relative quantitation value (LFQ). Additional information regarding the participants can be found in the publications.

Study cohort and sample collection
The sample material was extracted and processed as described in [1] and 2]. Colon mucosal biopsies (roughly 1 mm 3 ) were sampled 40 cm from the anus by sigmoidoscopy, at the Regional Hospital Silkeborg Denmark, from 10 ulcerative colitis patients, 11 rheumatoid arthritis patients and 10 controls in the period from 2012 to 2013. The biopsies were immediately transferred to cryotubes and snap-frozen in liquid nitrogen followed by storage at minus 80°C until proteomics sample preparation. All participants had given a written informed consent prior to participation in the study, and the project was approved by The Regional Scientific Ethical Committee (S-20120204) and the Danish Data Protection Agency (2008-58-035).

Proteomic sample preparation
The biopsies were randomized, and enzymatic digested using a modified filter-aided sample preparation protein [17][18][19][20][21][22]. Briefly explained, the biopsies were homogenized in 0.5 mL cold sample buffer (5% sodium deoxycholate, 50 mM triethylammonium bicarbonate, pH 8.5). The lysate protein concentration was estimated by absorbance at 280 nm measured using a NanoDrop 1000 UV-vis spectrophotometer (Thermo Scientific, Waltham, MA, USA). Additionally, the concentration of four biopsy lysates was determined using a bicinchoninic acid assay (BCA) with bovine serum albumin as standard, measured using an Infinite microplate reader (Tecan, Männedorf, Switzerland). The nanodrop measurements were calibrated using the BCA results. 100 µg protein was transferred to 30 kDa molecular weight cutoff spin-filters (Millipore, Billerica, MA, USA) to facilitate buffer exchanges by centrifugation at 15,000g for 15 min between all steps. Protein disulfide bonds were reduced by addition of 100 µL 10 mM tris(2-carboxyethyl)phosphine (Thermo Scientific, Waltham, MA, USA) and alkylated by addition of 100 µL 50 mM 2-iodoacetamide (Sigma-Aldrich, St. Louis, MO, USA) in sample buffer. Two µg sequencing grade modified trypsin (Promega, Madison, WI, USA) diluted in lysis buffer with 0.5% sodium deoxycholate was added to the spin-filter, and the proteins were digested to peptides overnight at 37°C. The peptide material was eluted from the spin-filter and purified by phase inversions with 1:1 (v/v) ethyl acetate with 1% formic acid, and dried down in a vacuum centrifuge overnight, and stored at −80°C for a maximum of one week prior to analysis.

Proteomic analysis
The peptides were analyzed by LC-MS/MS using an UltiMate 3000 UPLC system (Thermo Scientific, Waltham, MA, USA) coupled online to a Q Exactive plus mass spectrometer (Thermo Scientific). Five µg peptide material was loaded onto a 2 cm reverse phase C18-material trapping column and separated on a 50 cm analytical column, both from Acclaim PepMap100 (Thermo Scientific). The liquid phase consisted of 96% solvent A (0.1% formic acid) and 4% solvent B (0.1% formic acid in acetonitrile), at a flow rate of 300 nL/min. The peptides were eluted from the column by increasing to 8% solvent B and subsequently to 30% solvent B on a 225 min ramp gradient, and introduced into the mass spectrometer by a picotip emitter for electrospray ionization (New objective, Woburn, MA, USA). The mass spectrometer was operated in positive mode with data-dependent acquisition, alternating between survey spectra and isolation/fragmentation spectra using a top12 method. Selected eluting peptides were excluded from re-analysis for 30 s. All biopsies were analyzed in triplicates in a random order.

Data processing
The generated RAW-files were searched with MaxQuant 1.5.2.8 software against the Uniprot Homo sapiens reference proteome database with isoforms (UP000005640, last modified 2015-01-16, entry count 90,434) [23,24]. Standard settings were employed, with the following abundant peptide modifications included in the search: Carbamidomethylated(C) (fixed), N-terminal protein acetylation (variable), oxidation(M) (variable), and deamidation (N or Q) (variable) [11,12]. The match between runs feature in MaxQuant was enabled to allow the transfer of confident peptides identifications across LC-MS/MS runs, based on accurate mass-to-charge and retention time. Identified proteins and peptides were filtered to o1% false discovery rate [25]. Label-free quantitation was enabled in MaxQuant to report protein and peptide relative quantities using standard parameters.

Funding sources
The Lundbeck Foundation Denmark (R181-2014-3372), and the Carlsberg Foundation (CF14-0561) are acknowledged for grants enabling the project (TBB grants). Knud and Edith Eriksens Memorial Foundation ("Knudog Edith EriksensMindefond") and Ferring are acknowledge for grants, enabling the collection of the biological sample material (VA grant). The Obelske Family Foundation and the Svend Andersen Foundation are acknowledged for grants supporting the analytical platform being part of the Danish National Platform for Proteomics (PRO-MS) (AS grants).