A Filtered Database Search Algorithm for Endogenous Serum Protein Carbonyl Modifications in a Mouse Model of Inflammation*

During inflammation, the resulting oxidative stress can damage surrounding host tissue, forming protein-carbonyls. The SJL mouse is an experimental animal model used to assess in vivo toxicological responses to reactive oxygen and nitrogen species from inflammation. The goals of this study were to identify the major serum proteins modified with a carbonyl functionality and to identify the types of carbonyl adducts. To select for carbonyl-modified proteins, serum proteins were reacted with an aldehyde reactive probe that biotinylated the carbonyl modification. Modified proteins were enriched by avidin affinity and identified by two-dimensional liquid chromatography tandem MS. To identify the carbonyl modification, tryptic peptides from serum proteins were subjected to avidin affinity and the enriched modified peptides were analyzed by liquid chromatography tandem MS. It was noted that the aldehyde reactive probe tag created tag-specific fragment ions and neutral losses, and these extra features in the mass spectra inhibited identification of the modified peptides by database searching. To enhance the identification of carbonyl-modified peptides, a program was written that used the tag-specific fragment ions as a fingerprint (in silico filter program) and filtered the mass spectrometry data to highlight only modified peptides. A de novo-like database search algorithm was written (biotin peptide identification program) to identify the carbonyl-modified peptides. Although written specifically for our experiments, this software can be adapted to other modification and enrichment systems. Using these routines, a number of lipid peroxidation-derived protein carbonyls and direct side-chain oxidation proteins carbonyls were identified in SJL mouse serum.

During inflammation, activated phagocytes secrete reactive nitrogen species (RNS) and reactive oxygen species (ROS) that can eliminate infectious agents. If inflammation is chronic, RNS and ROS can also damage surrounding host tissue, resulting in protein modification in the form of protein-carbonyls (1). Total protein carbonylation has been used as a marker of oxidative stress and inflammation and increased levels have been seen in heart disease, lung disease, aging, neurodegenerative disorders, and inflammatory bowel disease (2)(3)(4)(5)(6)(7). The carbonylation of proteins can result from the direct oxidation of protein side-chains, forming the glutamate and aminoadipate semialdehydes (Scheme 1) (8,9), but can also occur from the indirect oxidation of polyunsaturated fatty acids (lipid peroxidation) and carbohydrates, leading to a variety of reactive aldehydes (Scheme 2) (10). These aldehydes covalently modify proteins through conjugate addition (often Michael addition) to nucleophilic amino acid side chains, creating protein-bound carbonyls (10,11).
In a previous study, DNA oxidative damage products, from tissues from the SJL mouse model of inflammation, were quantitated (12). Only the lipid peroxidation adducts increased in association with inflammation, which suggested an important role of lipids in inflammatory disease progression and established a direct correlation between inflammation and the increased formation of reactive aldehydes from oxidized lipids. Although DNA modification because of inflammation has been the focus of many animal and human studies, it is proteins that are considered most likely to be ubiquitously affected by disease, response, and recovery (13), and the biological consequences include more rapid protein turnover as well as novel signaling (14 -16). The overall carbonylation of proteins has been well documented in other inflammatory animal models, which have shown significant increases in protein-carbonyls in the mucosal lining of rat colon (17) and mouse colon (5) whereas increased levels of protein carbonyls were observed in rat serum, along with a higher turnover of proteins from the inflamed tissue (18,19). Furthermore, increased protein carbonyl modification has been reported in studies of the colon mucosal lining from patients diagnosed with inflammatory bowel disease (20,21). Taken together, these observations suggest that an increase in carbonylated proteins is likely in the SJL mouse and that the extent and type of protein-carbonyls could potentially be a marker for inflammation and disease.
The SJL mouse is an experimental animal model used to assess in vivo toxicological responses to nitric oxide (NO) overproduction from inflammation (22). Injections of RscX lymphoma cells into these mice result in rapid tumor growth as well as host T-cell proliferation in lymph nodes, spleen, and liver, resulting in morbidity within 15 days. The induced macrophages create a 50-fold increase in NO production in spleen and lymph nodes and the post-translational modification 3-nitrotyrosine was highly elevated in spleen tissue.
The identification of endogenously formed protein carbonyls in serum is challenging because of their low abundance and the large number of possible modifications (1,2,23), some of which are shown in Schemes 1 and 2. We recently identified proteins modified by the carbonyl 9,12-dioxo-10(E)dodecenoic acid (DODE) in cells treated with the hydroperoxide of linoleic acid (13-HPODE) (24). This work used a technique first demonstrated by Maier and coworkers (25,26). Protein carbonyls were derivatized with an aldehyde reactive probe (ARP), 1 a biotinylated hydroxylamine that reacts preferentially with aldehyde and keto groups (27), allowing for subsequent enrichment of the modified proteins by avidin affinity. DODE-modified proteins were also identified using an anti-DODE antibody and Western blots. Although a number of DODE modified proteins were identified, we were unable to definitively identify the carbonyl modified peptides by mass spectrometry due both to low abundance and to the interference of ARP-tag-specific fragment ions on database searching.
In this current study, SJL mouse serum was screened for the presence of protein carbonyls endogenously formed during inflammation. Carbonyl-modified proteins were then identified using techniques previously established (24); first anti-DODE Western blotting followed by ARP derivatization/ enrichment and two-dimensional liquid chromatography tandem MS (2D-LC-MS/MS). These proteins then formed a database of putative carbonyl-modified proteins from SJL mouse serum. To identify the type of carbonyl modification and the modified peptide, the ARP derivatized peptides were enriched and analyzed by mass spectrometry. To minimize the confounding effect of ARP fragmentation, an algorithm (in silico filter) was written that filtered the mass spectrometry data to select only those peptides containing the known ARP pattern of fragmentation. This alone effectively reduced the number of false positives. To further alleviate the interfering effects of ARP fragments on peptide identification by database searching, a de novo searching algorithm (Biotin Peptide Identification program, BPI) was written. Peptides were evaluated against the database of proteins that had been previously identified as potentially carbonyl modified. Because modified peptides were searched against a finite list of proteins and all results were manually evaluated, the BPI program did not calculate a statistical peptide score, which allowed the identification of lower abundant modified peptides that would not be considered significant by standard search engines such as Mascot. The BPI program was also written with the flexibility to evaluate a wide range of known carbonyl-adduct masses and could therefore screen for a large number of carbonyl adducts at one time. This should also allow the program to be used with modification/enrichment systems other than the one used here. The program thus selected a finite number of carbonyl modified peptides, resulting in the identification of a number of proteins that were endogenously carbonylated in serum from the SJL mouse inflammation model.

MATERIALS AND METHODS
Materials-Aldehyde reactive probe (ARP) was purchased from Invitrogen (Eugene, OR) and biotin-PEO-LC-Amine was purchased from Pierce (Rockford, IL). Cytochrome c (equine heart), acetic acid, and trifluoroacetic acid were purchased from Sigma Chemical Co. (St. Louis, MO). Trypsin was purchased from Promega (Madison, WI). Gases were supplied by AirGas (Salem, NH). DODE was a generous gift from Prof. Ian A. Blair (University of Pennsylvania).
SJL Mouse Infection and Serum Extraction-RcsX cells (kindly supplied by Prof. Nicolas M. Ponzio, University of New Jersey Medical Center, Newark, NJ) were passaged through SJL mice (Jackson Laboratory, Bar Harbor, ME) and harvested from lymph nodes 14 days after inoculation according to published procedures (28). Cells were manually dissociated from lymph nodes followed by washing in phosphate-buffered saline (PBS; 140 mM NaCl, 2.7 mM KCl, 10 mM Na 2 HPO 4 , 1.8 mM KH 2 PO 4 , pH 7.4) and freezing in aliquots of 5 ϫ 10 7 cells in 10% dimethyl sulfoxide/fetal bovine serum. To initiate NO overproduction, eight SJL mice (5-6 weeks old) were each injected intraperitoneally with 10 7 RcsX cells in 200 l of PBS. Ten mice were injected with 200 l of PBS as unstimulated controls. Twelve days after injection, five treated and three control mice were anesthetized with isofluorane, and serum was collected by cardiac puncture. Pooled samples were desalted by filter centrifugation and dried in a SpeedVac. Protein content was determined with a commercial bicinchoninic acid (BCA) protein assay kit (Pierce).
SDS-PAGE of SJL Mouse Serum, One-and Two-Dimensional-One hundred micrograms (Coomassie) or 30 g (Western) of sample was processed by two-dimensional gel electrophoresis for protein identification and Western blotting. For anti-biotin Western blotting and Coomassie, proteins were focused on 11 cm 4 -7 immobilized pH gradient (IPG) strips (Immobiline™ DryStrip gels, Amersham Biosciences) using an IPGphor focusing apparatus (Amersham Biosciences). For anti-DODE Western blotting and Coomassie staining, proteins were focused on an 11 cm 3-10 pH gradient strip. Samples were applied by cup loading. IPG strips were then equilibrated in equilibration buffer (50 mM Tris-HCl, 6 M urea, 30% glycerol, 2% SDS) supplemented with 1% dithiothreitol to maintain the fully reduced state of proteins, followed by 2.5% iodoacetamide to prevent reoxidation of thiol groups during electrophoresis. Proteins were separated on 12.5% Tris/Glycine gels (BioRad) using a Criterion System (BioRad). Proteins were visualized by Coomassie SimplyBlue Safe-Stain™ (Invitrogen).
In-Gel Digestion-Protein spots were picked and washed in Milli-Q ® water for 15 min, then washed three times in 25 mM NH 4 HCO 3 /50% CH 3 CN for 30 min. Gel plugs were dehydrated in 100% CH 3 CN for 10 min while vortex-mixing. The supernatant was removed, and the plugs were dried in the SpeedVac. Trypsin (1 g/50 l) (Promega, Madison, WI) suspended in 25 mM NH 4 HCO 3 was added, and gel plugs were rehydrated for 30 min on ice and then digested overnight at 37°C. The samples were then centrifuged, and the supernatant was removed. The pellet was resuspended in CH 3 CN with 1% TFA, vortexed, and sonicated for 30 min to release hydrophobic peptides. The supernatant was removed and combined with the previous supernatant and stored at Ϫ20°C until ready for MS/MS analysis.
Neutravidin Affinity Chromatography of ARP Derivatized Serum Proteins-Neutravidin was used to enrich for ARP-carbonyl derivatized proteins from SJL serum. These proteins were identified (described below) to make up the protein database used in the Biotin Peptide Identification program. Immobilized neutravidin (Pierce, Rockford, IL) was packed into a column with dimensions ID ϭ 9 mm, OD ϭ 7 mm, height ϭ 60 mm. The final column volume was 2 ml. All buffers and samples were brought to room temperature. Neutravidin Affinity Chromatography of ARP Derivatized Serum Peptides-Neutravidin columns were prepared as described above. Serum proteins were digested in-solution to peptides with trypsin as follows: to 200 g of protein 4 g of sequencing grade trypsin (Promega) was added in ϳ200 l 50 mM ammonium bicarbonate; proteins were digested for 4 h at 37°C. Total serum peptide digests were added directly to the neutravidin column. Nonbiotinylated peptides were removed with 10 column volumes of PBS-2% CHAPS, five column volumes of PBS and five column volumes of deionized water. Bound peptides were eluted by the addition of four column volumes of 0.4% TFA/80% acetonitrile. Elution fractions were combined, frozen, lyophilized, and stored at Ϫ20°C. DODE Modification and ARP Derivatization of Mouse Serum Albumin and Cytochrome c-Mouse serum albumin and cytochrome c (1 mg/ml, 100 l in pH 7.0 chelex-treated 100 mM HEPES buffer) were reacted with DODE (224 nmols, 10 l ethanol) in the presence of vitamin C (10 mM) at 37°C for 24 h. Proteins were filtered (regenerated cellulose 3000 Da MWCO; Amicon, Billerica, MA) to remove un-reacted DODE and vitamin C, and brought up in PBS to a concentration of 1 mg/ml. DODE modified protein samples (1 mg/ml) were incubated with ARP (10 mM) at a final volume of 1.0 ml in phosphate buffer, pH ϭ 5-6. The reaction was stirred vigorously for 12 h at room temperature. The samples were filtered (regenerated cellulose 3000 Da MWCO; Amicon) to remove unreacted ARP.
Strong Cation Exchange (SCX) Separation for SJL Serum Proteins-Neutravidin elutions of proteins from SJL mouse serum were digested in-solution with trypsin as follows: to 200 g of protein 4 g of sequencing grade trypsin (Promega) was added in ϳ200 l 25 mM ammonium bicarbonate; proteins were digested for 4 h at 37°C. Digested proteins were desalted by SPE (Strata 50 m, tri-Func, C18-E), dried and re-dissolved in 100 l mobile phase A (described below) for SCX separation. Purified peptides were separated by SCX liquid chromatography using an Agilent 1100 LC system with a Polysulfoethyl A 100 ϫ 4.6 mm, 5 m column (Nest Group) and a flow rate of 0.25 ml/min. Mobile phase A was 10 mM sodium phosphate SCHEME 2. Reactive aldehydes, arising from oxidation of polyunsaturated fatty acids and carbohydrates, can indirectly lead to protein carbonylation.
LC-MS/MS (Applied Biosystems QStar Elite)-The electrospray interface for this instrument uses a micro-tee (Upchurch Scientific, Oak Harbor, WA) with a 1-in. piece of platinum rod, inserted into one arm of the micro-tee, to supply the electrical connection. The electrospray voltage was typically 1600 -1700 V applied just upstream of the column. Data-dependent MS/MS analysis was performed on the three most intense peaks in each full-scan spectrum, using multiply charged states (most of the nonpeptide background constituents are singly charged). Samples were spiked with 100 fmoles of vasoactive intestinal peptide fragment (amino acids 1-12) as the internal standard for calibration and mass accuracy.
Protein and Peptide and Modification Identification and Characterization-Peak lists for the Applied Biosystems QSTAR Elite were generated as mgf files by Applied Biosystems Analyst QS 1.1 software.
Database searches were carried out using Mascot version 2.2 (Matrix Science). For protein identification, either the NCBInr and/or the SwissProt mouse databases were searched. Parameters included: enzyme: trypsin; max missed cleavages ϭ 2; fixed modifications carbamidomethyl (C) for two-dimensional gel analysis only; variable modifications of methionine oxidation (M); precursor tolerance was set at 0.1 Da; MS/MS fragment tolerance was set at 0.2 Da. Proteins were identified with three or more peptides. Significance of a protein match for Mascot was based on an expectation value of Ͻ 0.05 and a combined peptide score Ͼ50.
For ARP-carbonyl modified peptides, the Mascot modification file was edited to add the DODE-ARP modification with elemental composition C 24 H 37 N 5 O 7 S (calculated monoisotopic mass 539.2414 Da) and specificity for lysine. In addition to the elemental composition of the modifying group, the Mascot modification file allows for the inclusion of neutral losses, used in scoring of a fragment ion spectrum and for the peptide neutral loss. The Mascot modification file also allows for ions to be ignored in scoring ("Ignore Masses"). Neutral losses and ignored ions were determined experimentally for ARP-DODE. Neutral-loss fragments included the entire modifying group, C 24  Programming Language-All programming for the in silico filter program and the Biotin Identification Program was done in Python Programming Language (http://www.python.org). Python programs were written to be used on the Macintosh OS X operating system. Python text files for the in silico filter program and the Biotin Identification Program are available for view or download at: http://web.mit.edu/toxms/www/filters.htm.

Identification of Putative Carbonyl Modified Proteins-Pro-
teins modified by carbonyls were identified by methods previously established (24). SJL mice were injected with RcsX lymphoma cells as described in Materials and Methods whereas control mice were injected with PBS. After 12 days, mice were bled and serum separated by centrifugation to generate infected and control SJL mouse serum samples. Samples were aliquoted and stored at Ϫ80°C.
Twenty micrograms of ARP-derivatized and nonderivatized serum proteins from control and infected SJL mice were separated by one-dimensional SDS-PAGE and analyzed by Western blot using either Streptavidin-HRP or an anti-DODE to analyze ARP-derivatized and nonderivatized respectively (Fig. 1). No DODE-modified proteins were detected in control mice, establishing the background level of DODE-modified proteins at essentially zero. Streptavidin-HRP Western blot analysis of carbonyl-modified proteins (ARP derivatized) from control (uninfected) SJL mouse serum demonstrated a low abundant background of carbonylation. This low amount of modified protein was below the detection limit of our mass spectrometry methods and therefore background carbonylation was also considered to be zero. Although abundant serum proteins are generally considered poor markers of disease and inflammation, carbonyl-modification of these proteins may be of interest, therefore the most abundant serum proteins such as albumin or transferrin were not depleted prior to analysis. This decision likely limited the identification of lower abundant carbonylmodified proteins but allowed the identification of potentially important carbonyl modifications on some of these abundant proteins.
Identification of DODE-modified Proteins-To identify DODE-modified protein candidates, proteins from the infected mouse serum were separated by two-dimensional SDS-PAGE and the modified proteins located by anti-DODE Western blot (supplemental material Fig. S1). Protein spots were cut from a corresponding Coomassie-blue-stained twodimensional gel and digested with trypsin. Proteins were identified by nano-LC-MS/MS and Mascot data base searching. A relatively small number of potential DODE carbonyl-modified proteins were identified by this method (these are highlighted in red in supplemental Table S1).
Identification of Carbonyl-modified Proteins-To identify serum proteins modified with carbonyls other than DODE, serum from the infected SJL mouse was reacted with the ARP and modified proteins were enriched by avidin affinity as described in Materials and Methods. Enriched modified proteins were digested with trypsin and peptides, then separated by two-dimensional chromatography (SCX) and reverse phase). Proteins were identified by tandem mass spectrometry and Mascot database searching. Supplemental Table S1 shows the complete list of putative carbonyl-modified proteins identified by both the two-dimensional gel anti-DODE method and the ARP two-dimensional chromatography method. Proteins identified by both methods are highlighted in red and were considered likely candidates for DODE modified serum proteins. This list of proteins was used as a database for the identification of the carbonyl modified peptides, as described below.
Database Searching of ARP-DODE Modified Peptides-The ARP tag can generate tag-specific fragments and neutral losses that inhibit successful database searching (24). To further explore this issue a DODE-ARP modified cytochrome c standard was prepared as previously described (24,29). Modified cytochrome c was enriched by avidin affinity, digested with chymotrypsin and analyzed by nano-LC-MS/MS (Materials and Methods). As previously reported (29), DODE was found to modify lysines on three chymotryptic peptides; 98 LKKATNE 104 , 83 AGIKKKTEREDLIAY 94 and acyl-1 GDVEKG-KKIF 10 . To evaluate the effectiveness of database searching to identify ARP derivatized carbonyl-modified peptides, mass spectrometry data was searched using Mascot with the DODE-ARP modification (C 24 H 37 N 5 O 7 S) added to the Mascot modification file.
The derivatization of protein carbonyls by ARP in effect doubles the size of the modification, increasing the tendency of the modification to undergo fragmentation by collision induced dissociation (CID). Fig. 2 shows typical fragmentation patterns of the peptide acyl-1 GDVEKGKKIF 10 , both unmodified ( Fig. 2A) and ARP-DODE modified (Fig. 2B). In Fig. 2B the ARP-tag-specific fragment ions, as well as peptide fragment ions with a neutral loss from the modification, are annotated. Fig. 2C shows the ARP fragments (m/z 227, m/z 332, m/z 299, and m/z 159).
Initially it was observed that the ARP-specific fragments (m/z 227, m/z 332, m/z 299, and m/z 159) decreased the Mascot statistical score to the extent that even visually good product ion spectra produced no positive hits. Following adjustment of the Mascot modification file to ignore these fragments, the peptides were identified correctly. Mascot scores of modified peptides, however, remained statistically low (Mscore ϭ 15-20), and it was therefore assumed that the ARPspecific fragments were not completely ignored, but were reduced in significance while still taken into account during scoring. This type of interference has also been observed with the biotin-containing ICAT tag (30).
Another consequence of ARP fragmentation was the introduction of diagnostic neutral losses into the MS/MS spectra. To improve the peptide identification statistical score, the previously-noted neutral losses of 331 Da and 539 Da were added to the Mascot modification file. Neutral losses for y/b fragment ions and peptide neutral losses were also accounted for. Scores then increased to 25, 30, and 22 for the peptides 1 GDVEKGK * KIFI 10 , 98 LK * KATNE 104 and 83 AGIK * KKTEREDLIAY 94 respectively. Unmodified peptides nonetheless consistently showed significantly higher scores (Ͼ50); this discrepancy of scores between modified and unmodified peptides is explained as follows. The Mascot peptide score (Mowse score or M-score) is calculated on a probability that the observed match between the experimental data set and sequence database entry is a chance event or M-score ϭ -10 * log(P) where P is the probability, i.e. the lower the probability the higher the score. The probability, P, is directly dependent on the number of theoretical fragment ions per given peptide, i.e. the greater the number of fragments the higher the probability of a mismatch. The addition of neutral losses as well as modifications greatly increases the number of possible fragments. For example, Mascot calculated that the unmodified peptide GDVEKGKKIF would have 91 theoretical fragments; 24 ions were identified out of the 91 and the peptide received an M-score of 51. The equivalent DODE-ARP-modified peptide GDVEKGK * KIF had a calculated 181 theoretical fragments because of neutral loses and the variable modification; 39 ions of the 181 were identified and the peptide received an M-score of 19. Assuming that both peptides (modified and unmodified) were treated identically, had similar mass spectra, similar number of background ions, similar fragmentation efficiency and similar peak list extractions, then scoring of the biotin modified peptides was directly limited by Mascot's probability, P, calculation. Achieving the highest score for biotin-modified peptides therefore becomes a balancing act. The addition of neutral losses to the search parameters must not overly increase the theoretical number of fragment ions (and thus increase P) but be sufficient to correctly identify peptide fragments (resulting in lowered P). For the example above, because the peptide neutral loss accounted for only one ion in the MS/MS spectrum, it was removed from the modification file. Although this ion was no longer scored, the theoretical number of possible fragments decreased to 178, which slightly increased the peptide M-score to 20.
The Effect of Modification on Peptide M-Score-To better understand the effect of the ARP-DODE modification on peptide identification, the doubly charged unmodified and modified peptides (acyl-1 GDVEKGKKIF 10 ) were subjected to various collision energies, and Mascot peptide M-scores and fragmentation efficiencies were calculated. Fragmentation efficiency was defined as the ratio of the summed abundances of identified product ions divided by the total ion abundance. Because only the efficiency of peptide fragmentation was being measured, biotin/ARP specific fragment ions (227, 259, 299, and 332) and neutral losses (331 and 539) were excluded from these calculations. The results indicated that the collision energies required to elicit a correct Mascot score (Fig.   3A) and for effective fragmentation (Fig. 3B) were almost twice as large for ARP-DODE modified peptides than for the corresponding unmodified peptides. In fact, energies sufficient to fragment unmodified peptides produced little fragmentation in modified peptides, which remained unidentified (score ϭ 0). An overall decrease in fragmentation efficiency of the modified peptide was observed. As previously seen, Mascot scores of modified peptides remained low compared with unmodified peptides. Scores for modified peptides also declined at higher collision energies because of the increasing dominance of biotin/ARP label-specific fragments in the MS/MS spectra. tion adducts 4-hydroxy-2(E)-nonenal (HNE-ARP), 4-oxo-2(E)nonenal (ONE-ARP), and 9,12-dioxo-10(E)-dodecenoic acid (DODE-ARP) were added to the Mascot modification file and searching parameters were adjusted to account for ARP fragments and neutral losses as previously described. The DODE modified peptide 549 K * QTALAELVK 558 was identified in mouse serum albumin (Fig. 4A). The DODE modification was seen on lysine 549, causing a missed cleavage during tryptic digestion. The peptide was identified with a peptide M-score of 34, statistically low but similar to what has previously been seen with the DODE-ARP cytochrome c peptides. Peptide fragmentation resulted in complete series of y ions and a partial series of b ions, along with the neutral loss of the entire DODE-ARP modification from the peptide (m/z ϭ 1100) and the neutral loss of ARP from b2 (m/z ϭ 465). To further confirm this peptide modification, mouse serum albumin was reacted in vitro with DODE; reacted protein was derivatized with ARP, digested with trypsin and modified peptides enriched by avidin affinity (Materials and Methods). The DODE-ARP modified tryptic peptide K * QTALAELVK was also identified, showing an identical mass spectrum fragmentation pattern, (Fig. 4B). The expected ARP fragment ions (227, 259, and 299) were identified in both the endogenously modified peptide and the in vitro DODE-ARP modified peptide, verifying ARP derivatization of these peptides.

Identification of ARP-DODE Modified Peptides in SJL
Analysis of the raw mass spectrometry data clearly showed an abundance of ARP derivatized peptides as seen by the identification of the ARP tag specific fragment ions, but further database searching by Mascot did not reveal other peptides that could be verified as modified by the carbonyls HNE, ONE, and DODE. As previously described, the ARP fragments and neutral losses were artificially decreasing the peptide Mscores, introducing a significant increase in the number of false positives, thus requiring manual verification of possible hits. Real database hits were easily lost in the hundreds of low scoring spectra that were incorrectly identified as carbonylmodified. Manual searching of all spectra proved unrealistic, and it was also probable that lower-abundant modified peptides would potentially remain unscored and therefore unidentified because of the interfering effects of the ARP fragments and neutral losses (previously demonstrated with the DODE-ARP cytochrome c standards). Therefore a new method for the identification of these ARP derivatized peptides was explored. In Silico Filter Program for the Identification of ARP Derivatized Peptides-As previously noted, ARP-specific fragments and neutral losses (Fig. 2C) can inhibit successful database searching and increase the number of false positives. These same values, however, constitute a potential fingerprint to locate the ARP-carbonyl modified peptides within the MS/MS raw data. Manual extraction of these ions from MS/MS raw data is possible but time-consuming and impractical, therefore a program was written (Materials and Methods) to search for these signals in the MS/MS peak lists generated from data-dependent tandem mass spectrometry experiments: the in silico filter program. A general outline of the program is shown in Fig. 5. The raw data is first converted to peak lists using the Mascot script associated with the Analyst software. This built-in script converts the raw mass spectrometry data into text files (mgf files or Mascot Generic Files) containing the parent ions identified by the mass spectrometer along with their associated CID fragment ions and intensities (see http:// www.matrixscience.com/help/data_file_help.html). The in silico filter program scans the peak lists within an mgf file to identify lists that contain the ARP fragment ions (227.0854, 332.1387, 299.1127, and 259.1223). The program also searches for parent ions with a neutral loss of 331.1344; corresponding to the ARP (Fig. 2). It then creates a set of five new filtered mgf files. The first of these (filter file #1) is made up of peak lists containing the ion at m/z 227, corresponding to the most-frequent biotin fragment. The second (filter file #2) contains peak lists with m/z 227 plus one other fragment ion (m/z 332 or 299 or 259). Peak lists in the third file (filter file #3) contain m/z 227 plus two or more fragment ions. The fourth file (filter file #4) required peak lists to contain all four fragment ions. Finally, the fifth filtered file (filter file #5) requires the peak lists to contain all four fragment ions and the parent ion with the neutral loss of 331. These five filtered mgf files allow the user to choose the extent of filtration with the first file corresponding to the least filtered and the fifth file to the most filtered. In general, the presence of three or more fragment ions (filter file #3) was sufficient to mark a peptide as modified with a high degree of confidence.
Fragmentation of the peptide backbone occasionally resulted in the formation of peptide-specific fragment ions with masses similar to the ARP biotin fragment ions. To increase the software's specificity for ARP-specific fragment ions, the user could enter an accurate-mass value between Ϯ 1.0 Da and 0.0001 Da. Values in the range of Ϯ 0.01-0.001 Da would generally exclude any ions that were associated with peptide fragmentation rather than ARP fragmentation. For these experiments, all tandem mass spectrometry runs contained an internal standard allowing for a mass accuracy within 5 ppm (about Ϯ 0.001 Da).
To prevent the program from selecting apparent ARP fragment ions from the noise of the MS/MS spectra, the program calculates the maximum peak intensity for the ions in each individual peak list. The user then enters the desired percent of this maximum peak intensity to determine a threshold above which ARP-fragments are considered to be present. By inspection, we noted that for ARP derivatized peptides this maximum ion was usually the m/z 227 biotin fragment. In general a threshold of 10% of the maximum peak intensity will exclude ions associated with the noise, but allow the program to identify some of the lower-abundance authentic ARP fragments. The ability of the filter program to highlight only ARP derivatized peptides was tested with an ARP-DODE cytochrome c standard that was digested with chymotrypsin and analyzed by LC-MS/MS. The raw mass spectrometry data was extracted into an mgf peak list file and filtered using the in silico filter program with mass accuracy set to Ϯ 0.01 Da and a threshold of 10% of maximum peak intensity for fragment ions. The mgf output files as well as the unfiltered mgf file were then analyzed with Mascot. Table IA shows the Mascot results from the unfiltered mgf file and the filter file #3 (requiring three or more ARP fragment ions). The unfiltered file resulted in the identification of cytochrome c and its DODE-ARP modified peptides as well as many unmodified peptides of cytochrome c, chymotrypsin and keratin. The unfiltered file also included one false-positive modified peptide in chymotrypsin. The search of the filtered data identified only the modified peptides (Table IB), with the notable removal of the chymotrypsin false positive. None of the DODE-ARP cytochrome c peptides were lost because of the filtering process. Searching with filter file #5 (requiring all five ARP associated ions) reduced the results to three modified peptides, whereas searching with filter file #1 (requiring the single ARP fragment at m/z 227) only slightly increased the number of unmodified peptides. It was concluded that a minimum of three ARP fragment ions with abundances above 10% of the maximum ion was sufficient to filter modified peptides. A corresponding study using DODE-ARP modified BSA is included in the supplementary Fig. S2.
In summary, the in silico filter program highlighted only carbonyl-ARP modified peptides and significantly reduced the complexity of the mgf file submitted for database searching. Three ARP fragment ions were sufficient to filter modified Biotin Peptide Identification Program-The in silico filter program output files for the DODE-ARP cytochrome c standard (described above) highlighted a significant number of modified peptides (or peak lists) that remained unidentified by Mascot, therefore a de novo-like sequencing database search algorithm was written: the BPI program. The BPI program was written to search the mgf files generated from the in silico filter program, and identify the modified peptides, without prior knowledge of the molecular weight of the carbonyl modification. This enabled searching at one time the many possible variations of protein carbonyls.
To run the BPI program, the user first enters a desired modification mass range (from Ϯ0.0001 Da to large as necessary). The program then selects a peak list from the filtered mgf file. To improve identification specificity, the fragment ion masses in the peak list are calibrated using the known masses of the ARP fragment ions (single or multi point calibration as specified by user). The program then calculates a background intensity level of the peak list being searched to prevent consideration of ions in the noise of the spectrum (further information on background calculation can be seen at http:// web.mit.edu/toxms/www/filters.htm). The BPI program virtually digests (tryptic or chymotryptic) the protein database being searched. The parent ion from the peak list being searched is compared with each peptide mass of the virtually digested peptides from the database. If the differences in mass are within the specified modification mass range, this difference is saved as a potential carbonyl modification mass for that peptide. The peptide is identified by comparing the y and b ions of the virtual peptide with the experimental peak list fragmentation ions as described below.
For a match to be significant, the program first calculates the percent coverage of unmodified y and b fragment ions identified in the experimental peak list. If this is lower than 25-50% (user specified) the peptide is rejected. (Because the peak lists being searched are generated from the in silico filter program, they are all considered to be modified with at least one carbonyl, therefore 50% is the maximum possible coverage for unmodified y and b ions.) Fragment ions from the experimental peak lists of unrejected peptides (percent coverage sufficiently high) are compared a second time to the virtual peak list fragment ions; this second comparison includes the calculated modification mass. A peptide is considered a "hit " if a minimum of five sequential y or b ions are identified. For example, y3, y4, y5, y6, and y7 ions constitute a hit while y3, y4, y5, y7, and y8 do not. The user can specify between 0 and 5 sequential ions to allow for more or less rigor. This twofold process (% coverage followed by sequential ions) for peptide identification is necessary to limit false positives, because the modification mass is unknown. Once the peptide is identified, the BPI program will back-calculate the mass of the potential protein carbonyl modification by subtracting the appropriate mass of the ARP biotin tag and output the peptide, the protein and the calculated carbonyl mass. A general outline for the BPI program can be seen in Fig. 5.
The BPI program is written to identify carbonyl-modified peptides that are not being identified by Mascot. To test this, the tandem mass spectrometry mgf files generated from the ARP-DODE cytochrome c standard were filtered using the in silico filter program (described above). Filter file #3 was searched by the BPI program and Mascot. To save time, a minimal database, containing only cytochrome c and serum albumin, was used. The results are shown in Table II  gram identified these three peptides as well as seven other ARP-DODE modified peptides (Table IIB). These previously unidentified peptides consisted of missed cleavages and secondary cleavages for chymotrypsin as well as the peptide 61 K * EETL 65 , which has also been identified as a minor DODE modification site of cytochrome c (24).

Identification of Carbonyl Modified Peptides in SJL Mouse
Serum by Database Search with the BPI Program-To improve the outcome of the BPI program, a searchable database was made using the proteins previously identified as carbonyl modified (supplemental Table S1). This database served two purposes, first the BPI program is slow and the use of a smaller database would enable more searches. Second, peptides identified by the program from this database had a higher likelihood of being a true hit because the proteins were previously identified as carbonyl modified. To identify the carbonyl modification, the mgf files from the RcsX infected SJL mouse serum and control mouse serum (described above) were filtered using the in silico filter program. The program conditions used a mass accuracy of Ϯ 0.01 Da and 10% maximum peak intensity for fragment ions. As described above, the presence of three major ARP fragment ions in a spectrum was sufficient to select a peptide as modified by the filter program, so the filter file #3 generated by the in silico filter was submitted to the BPI program for database searching. The BPI program conditions for database searching were as follows: protease: trypsin; missed cleavages: 1; fragment ion mass accuracy: Ϯ 0.01 Da; N terminus acetyl modification: no; variable modification oxidized methionine: yes; percent coverage of unmodified fragment ions in spectrum: 35%; number of sequential fragment ions required in spectrum: 5; low mass range: 329; high mass range: 800; background intensity level: 1. The low mass range was chosen so that the peptides identified would at least contain a modification equivalent to the molecular weight of ARP. The high mass range of 800 was considered sufficient to identify the larger carbonyl modification such as DODE. The program was run to identify the lipid peroxidation carbonyl modifications on amino acids K, H, and C; the direct side-chain oxidation of residues P, R and K to the semialdehyde carbonyls, and to identify the theoretical oxidation of O and N-linked glycans at residues S, T, and N, and was configured to search in silico filtered data with a modification mass range of 300 -800 Da. Because the oxidation of P, R and K to form the semialdehydes can reduce the molecular weight of the peptide the low mass of 300 was used instead of 331 (mw of ARP).
The search results on the RcsX infected SJL mouse serum peptides are shown in Table IV. The MS/MS spectra of all peptide hits were verified manually and these are included in the supplemental materials section. As previously seen by Western blot (Fig. 1), albumin was highly carbonyl-modified. The DODE-modified peptide 549 K * QTALAELVK 560 , previously identified by Mascot, was identified, and the albumin peptide 546 QIK * K 549 was also identified as DODE modified. The modification of both lysines 548 and 549 was not surprising because runs of lysines have previously been seen to be modified by DODE in our in vitro experiments (30). Albumin was also modified at peptide 222 MK * CSSMQK 229 ; this modification corresponds to oxidation of the lysine 223 side chain to the aminoadipic semialdehyde (Scheme 1). A fourth carbonyl modification was found on the albumin peptide 234 AFK * AWAVAR 242 ; the calculated carbonyl modification mass was ϩ 112 Da and identified on lysine 236. This carbonyl adduct potentially corresponds to the lipid peroxidation product 2-heptenal. Haptoglobin was modified at peptide 112 GSFPWQAK * M * ISR 123 . The modification of lysine 119 corresponded to direct oxidation to the aminoadipic semialdehyde and the modification of methionine 120 as the oxidation product of ϩ16, which had been added as a variable modification during the BPI search. An ONE lipid peroxidation carbonyl modification was identified on lysine 166 of the hemopexin peptide 166 K * WFWDFATR 174 . Carbonyl modifications were also identified on hemopexin at peptide 103 GPDSVFLIK * EDK 114 with the oxidation of lysine 111 to the aminoadipic semialdehyde and at peptide 78 GHSGTR * ELISAR 89 with the oxidation of arginine 83 to the glutamic semialdehyde. An HNE lipid peroxidation carbonyl modification was identified at hystidine 335 on the inter-alpha trypsin inhibitor, heavy chain 3 peptide 334 DH * LVQATPANLK 345 . An HNE modification was found at histidine 431 of the muranoglobulin-1 peptide 431 H * ASAK 435 . A second HNE carbonyl modification was identified at histidine 856 on the muranoglobulin-1 or muranoglobulin-2 peptide 856 H * TSSWLVTPK 865 . Pregnancy zone protein (or ␣-2macroglobulin precursor) was seen to be modified by the oxidation of lysine 306 to the aminoadipic semialdehyde on peptide 305 TK * VFQLR 311 and with a possible 2-heptinal modification at lysine 1205 on peptide 1204 VK * ALSFYQPR 1213 . Transferrin was modified by the oxidation of proline 164 to the glutamic semialdehyde carbonyl on the peptide 163 SP * LEK 167 ; by the oxidation of lysine 299 to the aminoadipic semialdehyde on the peptide 298 SK * DFQLFSSPLGK 310 ; by the modification of hystidine 268 with the lipid peroxidation adduct HNE on the peptide 265 IPSH * AVVAR 273 ; and by the oxidation of either lysine 274 or 278 to the aminoadipic semialdehyde carbonyl on the peptide 274 K ? NNGK ? EDLIWEILK 287 (the exact modification site could not be determined).
As shown in Table IV, carbonyl modified peptides were found on a number of proteins from the database, including ␣-2-HS-glycoprotein, ␣-2-macroglobulin apolipoprotein A1, apolipoprotein E, ceruloplasmin isoform a and b, compliment component 3, esterase 1, and serine (or cysteine) proteinase inhibitor, although the specifically modified peptides could not be identified. The inability to identify a carbonyl modification on Apo A1 was surprising considering that its modification by HNE has been reported (31). DISCUSSION Carbonyl modified proteins have proven difficult to analyze, and only a few studies have correctly identified the peptides from endogenously modified proteins. In this study a number of protein carbonyls were identified in serum of the SJL mouse inflammation model. Modified proteins and peptides were identified using biotin and avidin affinity in conjunction with new software developed in-house that filtered and searched spectra that were not identified during Mascot database searching. Mascot is a powerful tool for database searching, quantification, and identification of protein modifications. Large modifications, however, can fragment during MS/MS CID and these can interfere with Mascot searches (23,33). Regardless of spectral quality, the ARP-biotin modified peptides will generate Mascot scores that are lower than those for the corresponding unmodified peptides. To identify modified peptides the researcher must therefore lower the statistical cut-off point at which spectra are accepted or rejected, consequently leading to an increase in the number of false positives. Furthermore, the low score requires that all identified peptides be manually or visually verified. Even relatively simple systems (such as the ARP-DODE-modified cytochrome c standard used here) become increasingly difficult to analyze and result in many low-scoring identifications, the majority of which are artifacts. For an in vivo sample, these problems are compounded by the complexity of the sample and the sample processing, resulting in an inevitable increase in the generation of false positives.
Two programs were written to help overcome some of the problems associated with database searching and identifying biotinylated carbonyl-modified proteins in the SJL mouse serum. The in silico filter program takes advantage of the biotin-ARP modification-specific ions that are formed during peptide fragmentation. The same ions that impede effective peptide identification can be used as a fingerprint to highlight modified peptides in the mass spectrometry data. Using this program, modified peptides are filtered in silico from the mass spectrometry data, resulting in a new data file containing only carbonyl modified peptides. Mascot database searching of the filtered files result in a reduced number of false positives, simplifying manual verification of modified peptides.
To improve the identification of the in silico filtered peptides, other types of evaluation criteria were explored using an in-house database search algorithm (BPI) that evaluates the theoretical number of unmodified peptide fragments (y and b ions) and the number of sequentially identified modified and unmodified peptide y and b ion fragments (the longer the sequence the lower the probability of a mismatch). The peptides are searched in this de novo-like fragment-dependent manner resulting in modified peptide identification without the necessity of knowing the carbonyl modification mass. The calculated modification mass is part of the BPI software output along with the peptide and protein identification. The program identifies ARP-carbonyl modified peptide standards from a mammalian database, and calculates the correct modification mass with a minimal number of false identifications.
The BPI program was written as a specialized database search engine for identifying biotin-derivatized peptides from data generated by the in silico filter program. It was also designed to use a simplified database of known modified proteins. It therefore does not calculate probability scores for peptides since output is kept to a minimum and can easily be manually verified. Without probability scores, the program does not bias searches with scoring criteria and is able to identify modified peptides not seen during Mascot searching. Although the program allows complete access to the data analysis portion of this research, it is relatively inflexible with respect to its use as a general database search engine, since such searching requires good statistical peptide scoring as well as an ability to search large quantities of data in a relatively short time.
It should be stated that both the in silico filter and BPI programs were written to be a semi-empirical approach to identifying spectra from modified peptides. The programs identify spectra that correspond to a few modified peptides, making manual verification possible. These programs use no statistical calculations to limit the number of false positives and manual verification is required. The BPI and the in silico filter were written to help with the identification of peptides with large biotinylated modifications and are therefore somewhat specialized. A referee noted that comparisons of the approach with different instrument types, e.g. ion-traps (typically richer in b-ions) versus quadrupole timeof-flight, could expand the range and usefulness of the software for identification of other large peptide modifications. Both program codes are available at http://web.mit.edu/ toxms/www/filters.htm.
The primary goal of this study was to identify carbonyl protein modifications in serum from the SJL mouse. A number of modifications, including HNE, ONE, DODE and a potential 2-heptenal modification, were identified from the lipid peroxidation pathway. HNE modification of histidine was identified on three proteins (Table IV), whereas DODE and ONE modifications were identified only on albumin and hemopexin respectively. This result is somewhat contrary to our previous studies in which DODE was the most reactive aldehyde formed from the hydroperoxide of linoleic acid (30). A moreabundant HNE modification in vivo, however, may not be not surprising because a lower reactivity would give the HNE aldehyde a longer half-life, allowing more opportunities for protein modification.
Carbonylation also arose through the direct oxidation of amino acid side chains of proline, lysine, or arginine, and the formation of the aminoadipic and glutamic semialdehydes. These modifications were identified on albumin, haptoglobin, pregnancy zone protein, and transferrin, and have often been considered a major source of protein carbonyls occurring from the production of free radicals by copper or iron Fenton chemistry. Transferrin and hemopexin were highly modified with semialdehyde carbonyls, with hemopexin containing the only oxidized arginine identified in the study. This is an interesting result considering that the biological function of these proteins involves the sequestering and transport of iron, likely making them more susceptible to Fenton chemistry.
As shown in Table IV, the modified peptides for a number of candidates for protein carbonyl modification remained unidentified. These were marked as "potential DODE" modifications if the protein was identified from an anti-DODE western gel or as "unknown carbonyl" if the protein was identified after biotin and avidin enrichment. The identities and potential carbonyl modification of these proteins will be addressed in future experiments.