The Use of Peptide Markers of Carp and Herring Allergens as an Example of Detection of Sequenced and Non-Sequenced Proteins

Food allergies, including allergies to fi sh and fi sh proteins, pose a signifi cant challenge for food science and medicine (1,2). Food scientists continue to search for new analytical methods that eff ectively detect allergens in foods. Therefore, it is important to develop analytical methods for the identifi cation of allergens. Mass spectrometry is one of the methods recommended for this purpose (3–5).


Introduction
Food allergies, including allergies to fi sh and fi sh proteins, pose a signifi cant challenge for food science and medicine (1,2).Food scientists continue to search for new analytical methods that eff ectively detect allergens in foods.Therefore, it is important to develop analytical methods for the identifi cation of allergens.Mass spectrometry is one of the methods recommended for this purpose (3)(4)(5).
Johnson et al. (4) formulated several recommendations concerning the selection of peptides to be used as allergen markers.One of the main principles they proposed is the uniqueness of peptides.A peptide should be unique for precursor proteins.Parvalbumins or their fragments are recommended as unique markers of species--specifi c fi sh allergens due to high sequence variability (8,12).A single peptide marker capable of detecting several allergens can also be identifi ed (13).A group of allergens that can be identifi ed by a single peptide belongs to the same family in the AllFam database (14).According to the principles of comparative proteomics, a change in the paradigm determining the choice of peptide biomarkers could support the identifi cation of proteins with un-known sequences (15,16).The carp (Cyprinus carpio) is an example of a fi sh species with many known protein sequences, whereas the herring (Clupea harengus) has never been studied extensively with the aim of determining its protein sequences.The UniProt database (17) screening using the carp Latin name resulted in 2564 protein sequences, whereas the herring Latin name occurs only in 275 items (status from 07.03.2016).In both species the number of sequences includes items considered as putative or derived from homology, but not found at protein level.
In Allergome (18), the biggest database of allergens, the listed allergens are not only proteins, but also tissues or species, e.g.fi sh.For example, the herring (Clupea harengus, Allergome code 1370) and the carp (Cyprinus carpio, Allergome code 1797) are listed together with diff erent proteins found in these species.In this database, allergens can be detected by identifying protein fragments synthesized by the analyzed organism.Fragments of allergenic proteins do not have to be always detected.In addition to parvalbumins, the best known fi sh allergens (2), other proteins may also be used as biomarker precursors.Myosins are the most prevalent myofi brillar proteins, and their fragments are generally easy to detect.Sequence similarity was reported among various myosins (19), which makes them good candidates for comparative proteomic analysis.Carp myosins are broadly represented in the UniProt database (17), whereas very few sequences of herring myosins are known.
The aim of this study is to identify fi sh protein markers for detecting multiple species based on a comparative proteomic approach that relies on fragments with identical sequences.The possibilities and challenges of the use of peptides obtained from myosin and parvalbumin fi sh proteins are discussed.

In silico analysis
Sequences of 112 carp (Cyprinus carpio) proteins and 19 herring (Clupea harengus) proteins from the UniProt database (17) were used.Most of the analyzed fi sh proteins were myosins and parvalbumins.Parvalbumins are considered as an example of peptides adequate for use as unique markers, whereas myosins, due to sequence conservation, as precursors of peptides suffi cient for analysis with use of comparative proteomic principles.
The fi rst stage of the study was the in silico analysis of carp and herring protein sequences to identify fragments of allergenic proteins that diff er from fragments of nonallergenic proteins.The analysis was performed using EVALLER soft ware (20).
The program identifi ed three fragments of individual proteins that best met the above criterion.Similarities with known fragments of allergenic proteins were identifi ed based on the Smith-Waterman score (21).Protein fragments were submitt ed for further analysis when the score was 84 or higher (minimum for proteins considered by the program as allergens).
The next stage of the study involved in silico proteolysis of potentially allergenic proteins selected by the EVALLER program.In silico proteolysis was performed using the PeptideMass application (22).Two enzymes, trypsin (EC 3.4.21.4) and pepsin (EC 3.4.23.1), were used to identify fragments released aft er proteolysis.The fragments produced by in silico proteolysis were regarded as potential markers if they were part of the fragments displayed in EVALLER.Peptides containing at least seven amino acid residues were submitt ed for further analysis.Fragments with a mass-to-charge ratio higher than 2000 Da were excluded.If more fragments produced by in silico proteolysis were displayed in one protein sequence, one fragment from carp protein and two fragments from herring protein with the highest sequence cross-coverage (SCC) between the proteolytic fragment and the fragment displayed in EVALLER were selected.SCC was calculated (and expressed in percentage) with the use of the following equation (13): where SCC is the sequence cross-coverage between the expected proteolysis product and the corresponding epitope, N c is the number of amino acid residues in the expected proteolysis product and in the fragment predicted by the EVALLER program, N e is the number of amino acid residues in the sequence of the fragment predicted by the EVALLER program, and N p is the number of amino acid residues in the sequence of the predicted proteolytic product.
The SSRCalc program (23), with a correction proposed by Dziuba et al. (24), was used to predict peptide retention times.The correction was introduced using the following equation: The following parameters were used to calculate the t Rpredicted : elution time of unretained compounds 2.02 min, parameter b 0.94, and pore diameter (closest to pore diameter in the applied column) 100 Å (24).MS/MS spectra were simulated in the fragment ion calculator program (25).Various types of typical fragment ions, including neutral loss (release of water or ammonia) (26), were taken into account.
Peptides from protein sequences listed in the UniProt database (17) were identifi ed experimentally with the use of the previously described ( 27) parameters (PAM10 matrix, expected threshold 1000) in the WU-BLAST program (28).The preceding amino acid residues that formed bonds susceptible to hydrolysis by trypsin or pepsin were also included, excluding the N-terminal parvalbumin fragment, according to previous recommendations (13).

Experimental analysis
Carp and herring were purchased from a local market (Olsztyn, Poland).Carp was supplied by the Szwaderki fi sh farm (Olsztynek, Poland), and herring was harvested from the Baltic Sea.Carp and herring were pur-chased as fresh carcasses and transported immediately to the laboratory where they were fi lleted, portioned and packaged on the same day.The prepared experimental material was frozen at -70 °C.Sarcoplasmic proteins were extracted from both fi sh species by the methods proposed by Bugajska-Schrett er et al. (29) and Carrera et al. (12).Extraction of sarcoplasmic proteins was carried out using frozen fi sh.A mass of 100 g of carp or herring white muscle was homogenized with two volumes of 10 mM Tris-HCl, pH=7.2, supplemented with 5 mM phenylmethylsulfonyl fl uoride (PMSF), using a commercial blender (Waring, Torrington, CT, USA) for approx. 1 min.Fish extracts were centrifuged (centrifuge model 3K30; Sigma Laborzentrifugen GmbH, Osterode am Harz, Germany) at 40 000×g for 25 min at 4 °C.The supernatants were fi ltered using membrane fi lters (0.22 μm; Whatman, GE Healthcare Life Sciences, Dassel, Germany), lyophilized and stored at -70 °C until analysis.Myofi brillar proteins were isolated according to the method described by Martinez et al. (30).Before extraction, carp or herring fi llets were placed in a freezer at -20 °C.White fi sh muscle was scraped while still frozen.A volume of 80 mL of Tris, pH=10.5, was added to 6 g of white fi sh muscle and homogenized for approx. 1 min.Aft er homogenization, samples were centrifuged at 15 000×g for 7 min at 4 °C.Aft er centrifugation, the supernatants (Tris extracts) were collected and frozen at -70 °C.A volume of 40 mL of the solution of 8 M urea, 4 % 3-[(3-cholamidopropyl)dimethylammonio]-2-hydroxy-1-propanesulfonate (CHAPS), 2 mM tributyl phosphate (TBP), 40 mM Tris and 0.2 % IPG (immobilized pH gradient) were added to the pellet.Samples were homogenized for 30 s, centrifuged at 15 000×g for 7 min at 4 °C, and the supernatants (CHAPS-urea extracts) were frozen at -70 °C.Protein samples were lyophilized and stored at -70 °C.Fish proteins were extracted from the same fi sh in triplicate.
The protein content of lyophilisates was determined according to the method proposed by Bradford (31).The analyses were performed in triplicate.
Sarcoplasmic and myofi brillar proteins were hydrolyzed aft er the extraction from fi sh specimens.Specifi c proteolysis was performed using two proteolytic enzymes: bovine pancreatic trypsin, catalog number T1426, and porcine pancreatic pepsin, catalog number P7012, both from Sigma-Aldrich, St. Louis, MO, USA.
Hydrolysis was performed under the following conditions: protein concentration 3 mg/mL, enzyme concentration 150 μg/mL, pH=8.0 for trypsin and 2.0 for pepsin, temperature 37 °C, and time of hydrolysis 24 h.The enzymatic reaction was stopped by deactivating the enzyme at a temperature of 100 °C for 5 min (32).Immediately aft er hydrolysis, the samples were frozen at -70 °C and lyophilized.
The solutions containing 0.5 mg/mL of the fi sh protein isolate soluble in salt solutions and their hydrolysates (trypsin and pepsin) were analyzed.The samples were dissolved in 6 M urea solution in a mixture of acetonitrile and water at a ratio of 100:900 by volume, pH=2.2, with the addition of trifl uoroacetic acid (TFA) according to the method described by Visser et al. (33).A Shimadzu (Tokyo , Japan) set comprising two LC-10AD pumps, an SCL-10AD autosampler, an SCL-10AD controller, a CTO--10AS thermostat and an SPD-M10AW photodiode detector with a Jupiter Proteo Phenomenex ® (Torrance, CA, USA) column, 250 mm×2 mm, particle diameter 4 μm, and pore diameter 90 Å, was used in the analysis.The Class-VP 5.03 Shimadzu ® application was used for data analysis.Solvent A was 0.01 % (by volume) TFA solution in water.Solvent B was 0.01 % (by volume) TFA solution in aceto nitrile.The gradient of solvent B was increased from 0 to 40 % during 60 min.The column was washed (40-100 % B for 60 to 65 min, 100 % B for 65 to 70 min) and equilibrated (100-0 % B for 70 to 71 min, and 0 % B for 71 to 80 min) (24,34).Data acquisition time was 80 min, fl ow rate 0.2 mL/min, injection volume 10 μL and column temperature 30 °C.
The RP-HPLC-MS/MS analysis was performed in a Varian 500-MS (Agilent Technologies, Santa Clara, CA, USA) ion trap mass spectrometer with electrospray ionization connected to an HPLC assembly containing two 212-L pumps, ProStar 410 autosampler, Degassit degasser (MetaChem Technologies ® , Torrance, CA, USA) and 2-2 nitrogen generator (Parker Domnick Hunter Scientifi c ® , Gateshead, UK).The column, solvent system, gradient and other HPLC separation parameters were identical to those described above.Data acquisition time was 5-60 min.Mass spectrometry parameters were as follows: needle and shield voltages: 5000 and 600 V respectively, spraying and drying gas (nitrogen) pressure 35 and 30 psi respectively, drying gas temperature 390 °C.The remaining parameters: positive polarity, capillary voltage 100 V, retardation factor loading 100 %, isolation window 3.0, excitation storage level m/z=206.3,excitation amplitude 2.98-3.28V, syringe volume 250 μL, sample loop volume 100 μL, needle tubing volume 15 μL, fl ush volume 100 μL, column oven setpoint 30 °C, frequency data recording 0.05-0.07Hz, single scan averaged from fi ve microscans, options such as: use of air segment, headspace pressure and alarm buzzer were included (24,34).Retention times of peptides were determined aft er smoothing using the algorithm described by Savitzky and Golay (35) as recommended previously (24).RP-HPLC-MS/MS analyses of hydrolysates were performed in duplicate.

Results and Discussion
The results obtained with the use of three protein sequences: herring and carp parvalbumins and carp myosin heavy chain are presented in Fig. 1.The complete sequence of herring myosin is not available in the UniProt database (17).To date (07.03.2016)only two myosin fragments, with accession numbers Q98ST0 and Q90ZP0 can be found in the UniProt database (17) using 'Clupea harengus' together with 'myosin' as a query.Their length covers less than one tenth of myosin heavy chain sequence.The sequences of herring myosin fragments do not contain subsequences selected in silico as potential protein markers, although they are very similar to carp myosin sequences.The list of peptide sequences identifi ed using the in silico analysis is summarized in Table 1.Parvalbumins form a group of proteins with high sequence variability, therefore, a peptide that is present in more protein sequences represented by that family is diffi cult to detect.The SCC value of this peptide is presented in Table 1, and it is att ributed to carp parvalbumin.All fragments generated by proteolysis simulation belong to the fragments identifi ed in the EVALLER program (20).The value of the SCC (13), discussed in this work, is the ratio of the length of the fragment from the proteolysis simulation to the length of the fragment identifi ed in the EVALLER program (20).The EVALLER program was originally designed to predict protein allergenicity (20).In this study, this application was used only to select possible protein fragments characterized by the highest similarity to the fragments of allergenic proteins.Protein fragments displayed by the EVALLER program oft en overlap with known sequential epitopes (36).
RP-HPLC was used to monitor proteolysis.Chromatograms of particular carp and herring protein fractions and products of their hydrolysis by pepsin are presented in Fig. 2. Intact proteins from isolates of sarcoplasmic and myofi brillar proteins were the dominant fractions with retention times exceeding 70 min (Figs.2a, c and e).These fractions disappeared during proteolysis (Figs.2b, d and  f).The dominant protein hydrolysate fractions were eluted in 20 to 70 min.In these chromatograms, relative peak area between 10 and 70 min ranged from 90 to 99 % of total peak area with retention times exceeding 10 min.Peaks eluted before 10 min contain unretained substances, such as components of buff ers for dissolving proteins or peptides (33).Similar results were obtained using hydrolysis by trypsin.lysates are presented in Fig. 3 (fragment ions named according to Roepstorff and Fohlman (37)).Peptides are displayed as groups of fragment ions detected at the same retention time (34,(38)(39)(40).Peptide fragmentation involved the formation of various types of fragment ions, including products of neutral loss.A list of experimentally identifi ed peptides is presented in Table 2.Only 10 out of 15 peptides selected in silico were identifi ed experimentally.Peptides were regarded as identifi ed if they formed a group of fragment ions with identical retention times.In line with the previous recommendation (24), the diff erence between predicted and measured retention times should not exceed 10 %.The risk of unsuccessful identification was discussed in a previous study (34).Unsuccessful identifi cation could be att ributed to the absence of fragmentation in an ion trap mass spectrometer or diff erences in retention time from the predicted value.The applied protocol determines which peptides can be iden tifi ed (41).Set of peptides possible to be identifi ed (so-called proteotypic peptides) may vary due to mass spectrometer type (e.g.matrix-assisted laser desorption ionization vs. electrospray).None of the methods guarantees identifi cation of all possible products of proteolysis (41).Proteins may undergo modifi cation when the molecular mass of protein fragments changes or when fl anking bonds become resistant to proteolysis.Such modifi cations were responsible for the fact that some peptides, selected in silico as carp myosin fragments, were detected only in hydrolysates of herring myofi brillar proteins (Table 2).Sequenc-es of herring myosins remain unknown, but they contain fragments identical to those of carp myosins.This suggests that potential precursors of the peptides listed in Table 2 could involve many more proteins with unknown sequences.
Experimental retention times of peptides vary within the ranges indicated in Table 2.This phenomenon is probably an artifact which is observed when chromatograms  (24) observed that this algorithm does not always support the generation of unambiguous retention times.
The results of BLAST search are presented in Table 2. Proteins containing peptide sequences whose fl anking bonds are susceptible to trypsin or pepsin were divided into three categories.The fi rst category covers parvalbumins and myosins from both edible and non-edible fi sh.The second category includes proteins from edible animals.Edible species are those used in the food industry, such as chicken (Gallus gallus) and turkey (Meleagris gallopavo).This category does not include some exotic animals or animals that are consumed incidentally in certain countries.In this study, animals of the type are regarded as inedible and are placed in the third category.According to the recommendations formulated by Johnson et al. (4), peptides belonging to the fi rst and third category are potential allergen markers.Johnson et al. (4) proposed a set of criteria for selecting peptide markers.They are: known sequence of precursor proteins, uniqueness, absence of chemical or enzymatic modifi cations in the peptide sequence, possibility of protein extraction from food ingredients or food products, possibility of peptide release with the use of selected proteolytic enzymes, and proteolysis-resistant proteases in the peptide sequence.Excluding the last two criteria, the above requirements have to be fulfi lled for peptide detection.Modifi cations of amino acid residues can change the molecular mass of peptides or make the preceding and successive bonds inaccessible for proteolytic enzymes used in hydrolysis (trypsin or pepsin in this study).Non--extractable proteins are not available for proteolysis.The predicted proteolytic patt ern is achieved when the last two criteria are met.The last two criteria were confi rmed by the detection of peptides.The peptides that do not meet any one of the last two criteria could not be identifi ed in this study.
The choice of peptides that do not originate from proteins with known sequences and are not unique for one protein creates new possibilities.The number of protein sequences in databases such as UniProt ( 17) is rapidly growing, but many sequences remain unknown.Protein sequences, including sequences of allergenic proteins, have not been studied extensively in all fi sh species.The above justifi es the use of the principle of comparative proteomics which states that identical or highly similar fragments may occur in homologous proteins with known and unknown sequences (15).Our fi ndings indicate that identical fragments released from carp and herring myosins can be used as protein markers.In our previous study (13), the same fragment (containing at least seven amino acid residues) was detected in homologous proteins belonging to the same family, identifi ed based on the presence of an appropriate domain.Allergenic proteins that belong to the same family in the AllFam database (14) are usually characterized by cross-reactivity.Allergens of carp and herring belong to the same families of proteins and their presence may be detected via identifi - The rapid increase in the number of known protein sequences also poses a signifi cant problem.Unique peptides that are markers of individual proteins are increasingly diffi cult to fi nd.A peptide that is initially regarded as unique may become a group marker when new protein sequences homologous to its fi rst known precursor are discovered.New precursors can also be found in the evaluated groups of proteins and organisms.In our study, this problem was noted when a peptide with fl anking bonds susceptible to trypsin or pepsin was present in proteins isolated from edible mammals, birds, reptiles or amphibians.According to a less restrictive version of the criterion proposed by Johnson et al. (4), a peptide from more than one precursor may be accepted as a marker if additional precursors are not found in food components.This restriction eff ectively prevents a false positive result.In our study, one peptide was a marker of a group of fi sh allergens when it occurred only in fi sh proteins or proteins from inedible vertebrates.Short peptides may be att ributed to a family of homologous proteins based on the presence of an appropriate domain (13,15,16,27,42).Examples of peptides used as allergen markers and originating from more than one precursor were discussed in our previous publication (42).The examples provided in the above reference concern peptides from bovine milk and chicken egg proteins.Their precursor proteins (α s1 -casein and lysozyme C) reveal interspecies conservation, understood as the presence of common fragments.Fish myosins possess the same property.Peptides that are potential markers of a smaller group of proteins are diffi cult to fi nd in analyses of single taxa such as fi sh.The search for an appropriate peptide set requires simulation of proteolysis.The number of the resulting fragments is then analyzed using BLAST (28) or a similar application.When protein sequences have been retrieved, additional information relating to the taxonomic lineage of species synthesizing those proteins has to be found.Lastly, evidence indicating that these species are suitable for human consumption has to be provided.Species that are presently not suitable for food production could be used by the food processing industry in the future.Further research is needed to devise a reproducible and rapid method of peptide selection.In this study, the EVALLER program (20) was used to speed up the process of peptide selection.This strategy delivered satisfactory results for parvalbumins, but it was less accurate for myosins.Peptide markers of the myosin tail family in the InterPro database (signature IPR002928) (43) are easy to fi nd, but a limited group of markers specifi c for individual taxa within this family is more diffi cult to identify.In the AllFam database (14), the myosin tail family is listed as a family of allergenic proteins (AF100), but invertebrate myosins are allergens.Vertebrate proteins from the myosin tail family, including myosin heavy chains, are not considered to be allergens.The number of myosins that are single peptide precursors accounts for approx.15 % of the total number of proteins belonging to the myosin tail family (Table 2).There are no simple algorithms for defi ning groups of allergens that can be detected based on a single marker.Further research is needed to examine the prevalence of myosins of diff erent animal organisms and their suitability for human consumption.
Although this study included fresh fi sh, mass spectrometry together with various separation techniques was applied to identify peptides from processed fi sh, e.g.cooked, canned, high pressure-treated before freezing, as well as peptides from parasites in fi lets (44)(45)(46)(47).
The detection of proteins without known sequences on the basis of peptide identifi cation has a weak point.Even absolute quantifi cation of a peptide cannot give information about allergen content if we do not have additional information about the source of proteins.Content of proteins being precursors of marker peptides may vary among species.Peptides from parvalbumins may serve for identifi cation of species and for quantifi cation of proteins.Identifi cation of peptides from myosins can help us to fi nd proteins from myosin family att ributed to fi sh and occurring together with fi sh allergens, such as parvalbumin.This information is incomplete.Even absolute quantifi cation of a peptide cannot give information about allergen content if we do not have additional information about the source of proteins (content and ratio of particular proteins may vary among species).Peptide sequences att ributed to protein families (defi ned according to Inter-Pro database (43)) instead of single sequence do not allow species identifi cation.Using unique peptides our alternative off ers complete and precise information if we fi nd a peptide originating from a precursor with known sequence or lack of information if we do not know such precursor.The presence of a peptide att ributed to a protein family provides partial information, but allows warning about the presence of allergen even if there is no known sequence.Peptides att ributed to protein families instead of single sequences, as recommended by Schevchenko et al. (15), may support allergen detection.Use of a more abundant peptide within a family may increase likelihood of detection of a protein belonging to this family, but not sequenced to date.This work can be considered as a preliminary study concentrated on the opportunity to fi nd peptides potentially serving for detection of allergenic fi sh, without known sequences.Myosins, occurring together with allergenic proteins, are highly conserved and provide bett er opportunity to fi nd unknown allergens than parvalbumins, possessing species-specifi c sequences.Many analytical problems are still to be solved, such as how to convert limits of detection of particular peptides into limits of detection of particular proteins, tissues and species, and how to quantify uncharacterized allergen on the basis of determined amount of peptide.

Conclusions
An analysis of carp and herring proteins confi rmed the possibility of fi nding peptides that are markers of proteins with unknown sequences.Such markers can be designed by abandoning the principle that peptides should be unique (should occur in one sequence only).Parvalbumin fragments of the analyzed fi sh can be recommended as unique markers, while myosin fragments may be recommended as group markers.Peptide markers could be fragments of allergenic proteins or proteins that are present with them and derived from the same organism.In this experiment we identifi ed ten peptide markers of carp and herring proteins.Two peptide markers were characteristic of parvalbumin, another two of myosin.Eight of ten identifi ed markers were peptides occurring in fi sh and other animals.The detection of protein groups based on the identifi ed peptides may be useful, in particular in view of the rapid increase in the number of proteins with known sequences.The possibility of peptide detection should be evaluated experimentally.A single peptide can be used as a marker of more than one allergenic protein and it may serve as a marker of peptides with known and unknown sequences.Bioinformatic algorithms speed up the selection of peptide markers.Peptides are identifi ed based on MS/MS spectra and predicted retention times.
Identical gradients and columns were used for RP--HPLC and RP-HPLC-MS/MS analyses.The diff erence in the time of analysis and retention times of particular fractions resulted from variations in dead volume of both HPLC assemblies.LC-MS/MS chromatograms of the DKKNVIRL peptide from carp and herring myofi brillar protein hydro-

Table 1 .
Peptides from carp and herring proteins selected in silico as potential markers *Accession numbers in the UniProt database are given.SCC=sequence cross-coverage

Table 2 .
(4)tide markers of carp and herring proteins identifi ed experimentally m/z=(M+2H + )/2, b m/z=M+H + , c peptide does not fulfi ll the criteria proposed by Johnson et al.(4).It was included in the table to demonstrate the problems associated with the detection of protein groups based on a single peptide marker cation of the same markers.Peptides found within this experiment may serve as such markers. a