Supporting data for the MS identification of distinct transferrin glycopeptide glycoforms and citrullinated peptides associated with inflammation or autoimmunity

This data article presents the results of all the statistical analyses applied to the relative intensities of the detected 2D-DiGE protein spots for each of the 3 performed DiGE experiments. The data reveals specific subsets of protein spots with significant differences between WT and CD38-deficient mice with either Collagen-induced arthritis (CIA), or with chronic inflammation induced by CFA, or under steady-state conditions. This article also shows the MS data analyses that allowed the identification of the protein species which serve to discriminate the different experimental groups used in this study. Moreover, the article presents MS data on the citrullinated peptides linked to specific protein species that were generated in CIA+ or CFA-treated mice. Lastly, this data article provides MS data on the efficiency of the analyses of the transferrin (Tf) glycopeptide glycosylation pattern in spleen and serum from CIA+ mice and normal controls. The data supplied in this work is related to the research article entitled “identification of multiple transferrin species in spleen and serum from mice with collagen-induced arthritis which may reflect changes in transferrin glycosylation associated with disease activity: the role of CD38” [1]. All mass spectrometry data have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with identifiers PRIDE: PXD002644, PRIDE: PXD002643, PRIDE: PXD003183 and PRIDE: PXD003163.

reflect changes in transferrin glycosylation associated with disease activity: the role of CD38" [1]. All mass spectrometry data have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with identifiers PRIDE: PXD002644, PRIDE: PXD002643, PRIDE: PXD003183 and PRIDE: PXD003163.
& 2016 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

Subject area
Biology More specific subject area

Proteomics and glycoproteomics
Type of data Tables, figures and raw data How data was acquired Scanned 2D-DiGE images were analyzed using the DeCyder7.0 software (GE Healthcare) using the Differential In-gel Analysis (DIA) module to detect and normalize the protein spots. Protein relative abundance across all samples and statistical analyses were performed using the Biological Variation Analysis (BVA) module of the DeCyder software. MS data for protein identification was acquired using a MALDI TOF/TOF UltrafleXtreme (Bruker), or a 4800 MALDI-TOF/TOF Analyzer (AB SCIEX). μLC-TOF-MS data for the analysis of the glycopeptides glycoforms of Tf was acquired with a 1200 series capillary liquid chromatography system (Agilent Technologies) coupled to a 6220 oa-TOF LC/MS mass spectrometer with an orthogonal G1385-44300 interface (Agilent Technologies).

Data format
Analyzed (excel files and word tables) and raw data Experimental factors Mice with Collagen-induced arthritis, or with chronic inflammation, or with no treatment. Protein extraction and/or purification from spleen or serum samples. CyDye labeling. 2-D gel electrophoresis.

Experimental features
Protein extracts from mice subjected to different experimental conditions were analyzed by 2D-DiGE, and protein species that differed in abundance were identified by MS/MS. PTMs such as citrullination of the identified proteins, or glycosylation of Tf species were further analyzed by MS. Data source location UB: Barcelona; UCO: Córdoba; IPBLN: Granada.

Data accessibility
Data is within this article. Data also available at the ProteomeXchange Consortium via the PRIDE partner repository, PRIDE: PXD002644, PRIDE: PXD002643, PRIDE: PXD003183 and PRIDE: PXD003163.

Value of the data
Application of μLC-TOF-MS for characterization of multiple glycopeptide glycoforms from mouse transferrin.
Investigation of altered transferrin glycopeptide glycosylation patterns in inflammatory and/or autoimmune diseases.
Mass spectrometry approach to identify new citrullinated peptides in mice with arthritis (CIA model).
Properly described approach for 2D-DiGE analysis to identify protein species that differ in abundance due to certain pathologies.
Basis for the study of altered protein species associated with inflammatory processes or arthritis in humans.

Data
Fig . 1 shows the extracted ion chromatograms (EICs) obtained by mLC-TOF-MS for the most abundant glycopeptide glycoforms of Tf isolated from WT non-immunized serum, Tf standard in a 2D gel, and Tf from a spleen extract in a 2D gel. Tables 1 and 2 in excel format show the list of the protein species identified by MS/MS, displaying the sequence of matched and fragmented peptides of a given protein. Table 3 shows the list of protein species identified by PMF. Tables 4-9, include the results of all the statistical analyses applied to the relative intensities of the detected 2D-DiGE protein spots for each of the 3 performed DiGE experiments. Table 10 shows the identities of the citrullinated peptides linked to specific protein species in CIA þ , or CFA-treated mice. Table 11 shows the peptide coverage of mouse Tf standard digested with trypsin, and Table 12 shows the normalized peak area and %RSD of Tf glycopeptide glycoforms detected by mLC-TOF-MS in the spots of spleen protein extracts subjected to 2D electrophoretic separation and in-gel tryptic digestion.  The sequence of matched and fragmented peptides of the identified proteins, plus the ion scores and confidence intervals of the fragmented peptides can be found in the online version of this article (Table 1, .xlsx file) as supplementary material. a Spots are named as indicated on the 2-DE gel shown in Fig. 1 in Ref [1]. b UniProtKB/Swiss-Prot accession number, MASCOT protein score, protein score confidence interval (C.I. %), total ion score, and total ion score confidence interval (C.I. %) are reported for the combined search of MALDI-TOF/TOF MS and MS/MS data (GPS Software, Applied Biosystems). c For protein scores, only confidence intervals above 99% were considered as statistically significant. d For total ion scores, only confidence intervals above 95% were considered as statistically significant. e Theoretical molecular weights and isoelectric points are given for each protein.  The sequence of matched and fragmented peptides plus the ion scores and confidence intervals of the fragmented peptides can be found in the online version of this article (

Mice
WT mice were purchased from Harlan Ibérica (Barcelona, Spain). Mice deficient in CD38 (CD38-KO) were backcrossed onto the B6 background for more than 12 generations, as described previously [3]. All studies with live animals were approved by the IPBLN and Universidad de Cantabria Institutional Laboratory Animal Care and Use Committees.

Induction and assessment of arthritis
For the induction of CIA, 8-12 weeks-old male mice were immunized as previously described [4,5].

Protein extraction from spleen preparations
Proteins were extracted from spleen by using the MicroRotofor Lysis Kit (for mammalian tissues and cells) (Bio-Rad, Ref 163-2141), following the manufacturer's instructions, which includes the use of mini-grinders for effective disruption of cells and tissues. The excess of salts and other contaminants were removed using the Bio-Rad's ReadyPrep 2-D cleanup kit. Samples were then resuspended in a DIGE-compatible buffer (7 M urea, 2 M thiourea, 4% CHAPS, 20 mM Tris, pH 8.5), quantified using the RC DC assay, and kept at À 20°C until further use.

Design of DiGE experiments
Unless otherwise indicated in each DiGE experiment conducted, four biological replicates of each condition were compared, comprising protein samples derived from four CD38-KO mice and four WT mice as previously described [1,6].

DiGE labeling and two-dimensional gel electrophoresis
Samples were aliquoted at 45 μg, and the pooled internal standard was made with 23 μg of each of the sixteen test samples combined. The proteins were labeled with 400 pmol (in 1 μL of anhydrous DMF) of CyDye per 50 μg of protein as per the manufacturer's instructions (GE Healthcare). After labeling, the appropriate samples were combined for each gel. Each combined sample ( $ 50 μL) was Table 6 Spleen protein species that differ in abundance by 2-ANOVA-Interaction in two groups of Col.II-immunized mice (CD38 KO and B6 WT) with two conditions: CIA þ and CIA À ).  [7], with the following modifications: (1) First-dimension IPG strips (Bio-Rad: 11 cm, linear pH 3-10 gradient); (2) Active in-gel rehydration at 50 V, 12 h at 20°C; (3) The IPG strips were focused in a one-step procedure, at 8000 V for a total of 35,000 Vh at 20°C with a current limit of 50 μA/strip. After electrophoresis, one of the gels was pre-scanned using the Typhoon 9400 variable mode imager at each of the appropriate CyDye excitation wavelengths (Cy3 (532 nm), Cy5 (633 nm), Cy2 (488 nm)), in order to determine the appropriate laser intensity for each CyDye. Thereafter, each of the analytical gels was scanned at this optimum laser intensity at a resolution of 100 μm. Gels were then fixed and stained with SYPRO Ruby (Bio-Rad) and re-scanned using the 488 nm laser. Scanned images were analyzed using   the DeCyder7.0 software (GE Healthcare) using the Differential In-gel Analysis (DIA) module to detect and normalize the protein spots. Standard was used to normalize gels by calculating the standardized abundance of each spot, i.e., the ratio of either Cy3 or Cy5 signal to that of Cy2.

Protein identification by MALDI-TOF/TOF MS/MS
In-gel digestion of proteins has been described previously [8]. A set of protein spots were identified by MS/MS using a 4800 MALDI-TOF/TOF Analyzer (AB SCIEX) in automatic mode with the settings described previously [6]. Protein identification was assigned by peptide mass fingerprinting and confirmed by MS/MS analysis of at least three peptides in each sample. Mascot 2.0 search engine (Matrixscience) was used for protein identification running on GPS software (Applied Biosystems) against the SwissProt Mus musculus database (uniprot_sprot_26042011.fasta). The search setting allowed one missed cleavage with the selected trypsin enzyme, a MS/MS fragment tolerance of 0.2 Da and a precursor mass tolerance of 100 ppm.
Other spots were identified by MS/MS using a MALDI TOF/TOF UltrafleXtreme (Bruker) in manual mode as previously described [6]. Fragment selection criteria were a minimum S/N ratio of 15, a maximum number of peaks set at 200. For each precursor selected for MS/MS analysis, fragment mass values in the range from 13 Da to 4 Da below precursor mass were used to peptide identification.
Protein identification was assigned by peptide mass fingerprinting and confirmed by MS/MS analysis of 5 peptides. Mascot Server 2.4 (Matrixscience) and ProteinScape 3.1 (Bruker) were used for protein identification against the SwissProt Mus musculus database (SwissProt_2015_06.fasta and NCBInr_20150409.fasta). The search setting allowed two missed cleavage with the selected trypsin enzyme, fixed modification was cysteine carbamidomethylation and variable modification was methionine oxidation, a MS/MS fragment tolerance of 0.5 Da and a precursor mass tolerance of 50 ppm, unless otherwise indicated.
The MS spectra of the identified proteins were further examined in order to detect the presence of citrullinated proteins. Protein citrullination (o deimination) is the enzymatic conversion of peptidylarginine residues to peptidyl-citruline, mediated by the family of calcium-dependent peptidylarginine deiminases (PADs) [9]. The search setting for this PTM with MASCOT was performed as in the previous paragraph, including as variable modification the deamination of arginine, with the following considerations [10]: (a) for one citrullinated arginine, the peptide theoretical mass increase is 0.98 Da and the modified peptide, losing one amino group, becomes more acidic; (b) citrullinated arginine residues are not likely to be cleaved by trypsin, so that a minimum number of one missed cleavage must be specified; (c) a peptide that includes a C-terminal citrullinated arginine must be rejected; (d) citrullinated peptides generate an unusual isotopic mass cluster as compared with that of unmodified peptides.

Table 11
Detected peptides in a tryptic digest of standard mTf analyzed by mLC-MS-TOF.
Detected peptides in mTf standard

μLC-TOF-MS
The mLC-TOF-MS experiments were performed in a 1200 series capillary liquid chromatography system coupled to a 6220 oa-TOF mass spectrometer with an orthogonal G1385-44300 interface (Agilent Technologies). LC and MS control, separation, data acquisition and processing were performed using MassHunter workstation software (Agilent Technologies). The oa-TOF mass spectrometer was tuned and calibrated following the manufacturer's instructions. Once a day, or even twice a day when required, a "Quick Tune" of the instrument was carried out in positive mode followed by a mass-axis calibration to ensure accurate mass assignments. In order to enhance detection sensitivity of glycopeptides, no internal recalibration was used [11]. MS measurement parameters were as described in a previous work [12]: capillary voltage 4000 V, drying gas (N 2 ) temperature 200°C, drying gasflow rate 4 L min À 1 ,nebulizer gas (N 2 ) 15 psig, fragmentor voltage 215 V, skimmer voltage 60 V, OCT 1 RF Vpp voltage 300 V. Data were collected in profile (continuum) at 1 spectrum s À 1 (approx. 10,000 transients/spectrum) between m/z 100 and 3200, working in the highest resolution mode (4 GHz). For separation, a Zorbax 300SB-C18 column (3.5 m particle diameter, 300 A°pore   diameter, 150 mM Â 0.3 mm LT Â id, Agilent Technologies) was used. Experiments were performed at room temperature with gradient elution at a flow rate of 4 mL min À 1 . Eluting solvents were A: water with 0.1% (v/v) formic acid, and B: acetonitrile with 0.1% (v/v) formic acid. Solvents were degassed for 10 min by sonication before use. The optimum elution program was: solvent B from 10% to 60% (v/v) within 45 min as linear gradient, followed by cleaning and re-equilibration steps of B: 60% to 100% (v/v) (5 min), 100% (v/v) (10 min), 100% to 10% (v/v) (5 min) and 10% (v/v) (10 min). Before analysis, samples were filtered using a 0.22 mm polyvinylidene difluoride centrifugal filter (Ultrafree-MC, Millipore, Bedford, MA, USA) at 12,000 rpm for 4 min. Sample injection was performed with an autosampler refrigerated at 4°C and the injection volume was 1 μL when analyzing Tf isolated from serum samples and digested with trypsin, and 5 μL when analyzing Tf in-gel digests.

μLC-TOF-MS data analysis
Prior to data analysis, a database with the exact monoisotopic mass of the different glycopeptide glycoforms of mouse Tf was created using Excel. To calculate the monoisotopic mass of each glycopeptide glycoform, it was necessary to calculate the elemental composition of all the glycopeptides taking into account the peptide and glycan contribution. First, the peptide sequence of mouse Tf was obtained from UniProt Knowledgebase (Q921l1), which also includes information about which cysteines and asparagines are involved in disulfide bonds and in N-glycosylation points, respectively. Afterwards, the theoretical sequence of each peptide and glycopeptide that would be obtained after tryptic digestion is obtained using the proteomic tool PeptideMass from the Expasy bioinformatics resource program. Subsequently, using the ProtParam tool from Expasy the elemental composition of the peptide sequence of the glycopeptide is obtained. Furthermore, the elemental composition of each glycan is calculated as the sum of the elemental composition of each monosaccharide that forms the glycan. Ion source webpage was used to obtain the elemental composition of each monosaccharide. Finally, the elemental composition of the peptide is added to obtain the molecular formula of each possible glycopeptides glycoform and thus, the monoisotopic mass with four decimals. Afterwards, the mass-to-charge values (m/z) for each glycopeptide glycoform are calculated up to a z value of 5 considering proton adducts (i.e. [ Finally, the data analysis is carried out using the software MassHunter Qualitative (Agilent Technologies). All the previously calculated m/z values for each glycopeptide glycoform are extracted together to obtain an extracted ion chromatogram (EIC) of that glycopeptide specie, as can be observed in Fig. 1, which shows the EIC for some glycopeptide glycoforms in three different samples. If more than one of the extracted masses is detected in one chromatographic peak of the EIC, the presence of the corresponding glycopeptide glycoforms can be confirmed. Tables 1 and 2 can be found in the online version of this article (.xlsx files). They show the list of protein species identified by MS/MS, displaying the sequence of matched and fragmented peptides of a given protein. Ion scores and confidence intervals of the fragmented peptides are also shown.