Data set for mass spectrometric analysis of recombinant human serum albumin from various expression systems

Human serum albumin (HSA) is a versatile and important protein for the pharmaceutical industry (Fanali et al., Mol. Aspects Med. 33(3) (2012) 209–290). Due to the potential transmission of pathogens from plasma sourced albumin, numerous expression systems have been developed to produce recombinant HSA (rHSA) (Chen et al., Biochim. Biophys. Acta (BBA)—Gen. Subj. 1830(12) (2013) 5515–5525; Kobayashi, Biologicals 34(1) (2006) 55–59). Based on our previous study showing increased glycation of rHSA expressed in Asian rice (Frahm et al., J. Phys. Chem. B 116(15) (2012) 4661–4670), both supplier-to-supplier and lot-to-lot variability of rHSAs from a number of expression systems were evaluated using reversed phase liquid chromatography linked with MS and MS/MS analyses. The data are associated with the research article ‘Determination of Supplier-to-Supplier and Lot-to-Lot Variability in Glycation of Recombinant Human Serum Albumin Expressed in Oryza sativa’ where further analysis of rHSA samples with additional biophysical methods can be found (Frahm et al., PLoS ONE 10(9) (2014) e109893). We determined that all rHSA samples expressed in rice showed elevated levels of arginine and lysine hexose glycation compared to rHSA expressed in yeast, suggesting that the extensive glycation of the recombinant proteins is a by-product of either the expression system or purification process and not a random occurrence.


Value of the data [describe in 3-5 bulleted points why this data is of value to the scientific community]
Represents a robust method for profiling variation in glycation of commercial products Demonstrates the utility of multi-enzyme approach for extensive protein sequence coverage allowing detailed analysis/comparison of non-complex proteomic samples Provides a model data set for qualitative and quantitative proteomics studies Covers the entire data generation/analysis process, from raw data, to processed peak lists, to search results, to exported result files.

Experimental design, materials and methods
As described in 'Determination of Supplier-to-Supplier and Lot-to-Lot Variability in Glycation of Recombinant Human Serum Albumin Expressed in Oryza sativa' [1,2,3,5].

Albumin sample preparation
Albumin samples were prepared as described previously [4]. Briefly, samples were buffer exchanged into 5 mM sodium phosphate (pH 7.4), with Amicon Ultra 0.5 ml 3000 Da MWCO centrifugal filter devices after pre-rinsing the filters with buffer. Protein concentrations were measured using a BCA assay kit (Sigma-Aldrich, St. Louis, MO, USA). Protein integrity after buffer exchange was assessed with 1-D SDS ÀPAGE using SYPRO Ruby protein stain (Molecular Probes, Eugene, OR, USA) and a Bio-Rad Molecular Imager Gel Doc XRþ system with Quantity One 1-D analysis software according to the manufacturer's instructions (Bio-Rad, Mississauga, ON, Canada). Trypsin and chymotrypsin solutions were prepared by adding 1000 ml of 50 mM NH 4 HCO 3 to 20 mg and 25 mg lyophilized protein, respectively. One 125 ml aliquot from each sample was digested with trypsin and the other with chymotrypsin. Digestion was carried out by spinning 100 ml of trypsin solution through the filters at 10,000 Â g over 20 min, then spinning 100 ml of 50 mM NH 4 HCO 3 through the filter under the same conditions. The flow through from the digestion steps was collected in clean tubes and evaporated to dryness in a vacuum centrifuge then re-suspended in 40 ml of injection solvent (3% acetonitrile, 0.2% formic acid and 0.05% TFA in water) prior to LC-MS/MS analysis.

LC-MS analysis-Sample analysis
For each sample, triplicate 2 ml aliquots were analyzed by loading onto a Waters Symmetry C18 trap column (180 mm Â 20 mm with 5 mm beads) and desalting with 0.1% formic acid in water (solvent A) for 3 min at 5.0 ml/min before separating on a Waters nanoAcquity UPLC BEH130 C18 reverse phase analytical column (100 mm Â 100 mm with 1.7 mm beads). Chromatographic separation was achieved at a flow rate of 0.500 ml/min over 70 min in six linear steps as follows (solvent B was 0.1% formic acid in acetonitrile): Initial-3% B, 2 min-10% B, 40 min-30% B, 50 min-95% B, 55 min-95% B, 56 min-3% B, final-3% B. The eluting peptides were analyzed by MS and MS/MS using a Waters Synapt HDMS system operating in data directed acquisition (DDA) mode. MS survey scans were 1 s in duration and MS/MS data were collected on the four most abundant peaks until either the total ion count exceeded 3000 or until 3 s elapsed. Within each analysis, redundant analyses were limited by excluding selected peaks 71.15 mass-to-charge (m/z) for 60 s. Between triplicate analyses, previously selected peaks were prevented from being reanalyzed by using m/z ( 71.15) and retention time (760 s) as exclusion criteria. Peaks from singly-charged peptides were also excluded from selection for MS/MS analysis. The instrument was calibrated prior to sample analysis using the fragmentation products of [Glu1]-Fibrinopeptide B. Calibration accuracy was maintained throughout the analyses using a nano-lock spray of 100 fmol/ml [Glu1]-Fibrinopeptide B, which was sampled for 1 s once every 30 s. The lock mass correction was applied to the data during processing.

LC-MS analysis-Qualitative data processing
Data were qualitatively analyzed using the Mascot software package, available from Matrix Science Ltd. (Boston, MA, U.S.A.). The raw data were processed using Mascot Distiller (version 2.4.2) to create Mascot Generic Files (MGFs) and database searches were performed using Mascot (version 2.4), against the human protein entries in the 2013_01 UniProtKB/Swiss-Prot database. MGF files from the triplicate analyses of both the trypsin and chymotrypsin digests were combined and submitted as a single search for each sample. Peptide and MS/MS mass tolerances were 100 ppm and 0.1 Da, respectively, and semi-tryptic and semi-chymotryptic peptides, from 2þ to 5þ charge state and having up to three missed cleavages, were considered. Carbamidomethylation of cysteine was specified as a fixed modification and oxidation of methionine and hexose addition on lysine and arginine were considered as variable modifications. The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifier PXD001248 and DOI 10.6019/PXD001248.