Dataset of cocoa aspartic protease cleavage sites

The data provide information in support of the research article, “The cleavage specificity of the aspartic protease of cocoa beans involved in the generation of the cocoa-specific aroma precursors” (Janek et al., 2016) [1]. Three different protein substrates were partially digested with the aspartic protease isolated from cocoa beans and commercial pepsin, respectively. The obtained peptide fragments were analyzed by matrix-assisted laser-desorption/ionization time-of-flight mass spectrometry (MALDI-TOF/TOF-MS/MS) and identified using the MASCOT server. The N- and C-terminal ends of the peptide fragments were used to identify the corresponding in-vitro cleavage sites by comparison with the amino acid sequences of the substrate proteins. The same procedure was applied to identify the cleavage sites used by the cocoa aspartic protease during cocoa fermentation starting from the published amino acid sequences of oligopeptides isolated from fermented cocoa beans.


Type of data
Peptide mixtures obtained by cleavage of different substrate proteins with purified cocoa aspartic protease or pepsin were analyzed by liquid chromatography-MALDI-TOF/TOF-MS/MS using a 4700 proteomics Analyzer (Applied Biosystems, Framingham,MS) of-line coupled with a Ultimate HPLC system and Probot fractionation devise (both Dionex/Thermo, Idstein, Germany). Amino acid sequences of oligopeptides isolated from fermented cocoa beans were taken from the literature.

Data format
Analyzed Experimental factors Samples were prepared by partial digestion of different substrate proteins with purified cocoa aspartic protease or pepsin. Prior to LC-MALDI-MS/MS analyses, the peptide mixtures were modified by reduction and alkylation of cysteine residues with dithiotreitol and iodoacetamide.

Experimental features
Generation of oligopeptide mixtures by digestion of substrate proteins with purified cocoa aspartic protease or pepsin, fractionation and sequencing of the peptides by LC-MALDI-TOF/TOF-MS/MS and subsequent identification of the cleavage sites. Data were compared with the cleavage sites predicted from the sequences of oligopeptides isolated from fermented cocoa beans and analyzed by liquid chromatography-tandem mass spectrometry. The abundance of the different amino acid residues in the P4-P4' positions around the cleavage sites were analyzed to get an insight into the particular cleavage specificity of the cocoa aspartic protease.

Data source location
Berlin, Germany, and Jena, Germany Data accessibility Data are within this article.
Value of the data 1. These data characterize the cleavage sites of the cocoa aspartic protease. 2. Characterization of the cleavage specificity of an endoprotease requires the comparative analysis of the amino acid sequences around many of its cleavage sites. 3. We provide a strategy enabling the discrimination between specific and unspecific cleavage sites of an endoprotease. 4. Our data demonstrate the limitation of the identification of protease cleavage sites by LC-MALDI-TOF/TOF-MS/MS versus ESI-MS/MS. 5. These data will contribute to our knowledge concerning the formation of the cocoa-specific aroma precursors.

Data
Three tables are presented. Table 1 contains the cleavage sites in different substrate proteins used by the cocoa aspartic protease and pepsin, respectively, identified by in-vitro proteolysis. Table 2 shows the putative cleavage sites of the cocoa aspartic protease used during commercial cocoa fermentation. Table 3 shows the abundance of the different amino acids in the P4 to P4 0 positions around the cleavage sites used by the cocoa aspartic protease during in-vitro proteolysis and cocoa fermentation, respectively. Table 1 Specific and common cleavage sites of cocoa aspartic protease and pepsin in different protein substrates a .

Substrate
Cleavage sites specific for the cocoa protease a Octapeptide sequences around the cleavage sites for the cocoa aspartic protease and pepsin, respectively, detected by partial proteolysis of myoglobin, the cocoa 21-kDa seed protein, and the cocoa vicilin-class(7S) globulin. Data were separately listed for sites exclusively cleaved by the cocoa aspartic protease and pepsin, respectively, and those cleaved by both proteases ( ¼unspecific cleavage sites). 2. Experimental design, materials and methods

Determination of cleavage sites by in-vitro proteolysis
Cocoa protease, the cocoa 21-kDa seed protein, and the cocoa vicilin-class(7S) globular storage protein were isolated from the acetone-dry powder of unfermented cocoa beans essentially as predicted. Since the peptides formed during cocoa fermentation are modified by a carboxypeptidase [2,5], the N-terminal cleavage sites are more reliable than the C-terminal ones. In case of the C-terminal ends of the corresponding oligopeptide, a downstream localized cleavage site was predicted, whenever the resulting peptide fragment could be modified by the cocoa carboxypeptidase [6] to the finally detected oligopeptide (indicated by "þ CP").
previously described [1,2]. 10 mg of horse myoglobin or of the individual cocoa seed proteins in 1 ml of 20 mM sodium acetate (pH 5.0) were partially digested with either 100 mg of purified cocoa aspartic protease or 50 mg of commercial porcine pepsin (Sigma-Aldrich Chemie, Taufkirchen, Germany). The obtained peptides were modified by reduction with dithiotreitol and subsequent alkylation of the cysteine residues with iodoacetamide before being analyzed by mass spectrometry. Table 3 Abundance of different amino acid residues in the P4 to P4 0 positions of the predicted and experimentally detected cleavage sites of the cocoa aspartic protease.  a Amino acid positions around the cleavage sites. b Predicted from the N-terminal and C-terminal ends of oligopeptides isolated from fermented cocoa beans [3,4]. c Detected by in vitro digestion of three different protein substrates with the cocoa aspartic protease (compare Table 1 The different cleavage sites were determined by localization of the N-and C-terminal ends of the oligopeptides within the amino acid sequence of the corresponding substrate proteins. The octapeptide sequences around the cleavage sites and their positions in the corresponding substrate proteins are listed in Table 1. Three classes of cleavage sites were found and separately listed ( Table 1): (1) Those which were exclusively cleaved by the cocoa aspartic protease ( ¼specific cleavage sites of the cocoa enzyme), (2) those which were cleaved both by the cocoa aspartic protease and pepsin ( ¼unspecific cleavage sites of the cocoa enzyme) and (3) those which were exclusively cleaved by pepsin.

Determination of putative in-situ cleavage sites used during cocoa fermentation
Oligopeptides isolated from fermented cocoa beans and sequenced by ESI-MS/MS mass spectrometric analyses were taken from the literature [3,4] and used to identify the putative in-situ cleavage sites of the cocoa aspartic protease in the 21-kDa cocoa seed protein and in the vicilin-class(7S) globulin of the cocoa beans, respectively. The octapeptide sequences around the putative cleavage sites used in the formation of the oligopeptides isolated from fermented cocoa beans and their positions in the amino acid sequences of the 21-kDa cocoa seed protein and the cocoa vicilin-class(7S) globulin, respectively, are listed in Table 2. Since the oligopeptides generated during fermentation of the cocoa beans are more or less modified at their C-terminal ends due to the activity of a carboxypeptidase [5], prediction of the C-terminal cleavage sites is less reliable than the cleavage sites predicted from the N-terminal ends. Due to the known cleavage specificity of this particular carboxypeptidase [6], however, the putative cleavage sites corresponding to the C-terminal ends of the original cleavage products generated by the cocoa aspartic protease can be predicted with at least some reliability. When the predicted C-terminal cleavage site was assumed to be downstream from the C-terminal end of the isolated peptide, this was marked by " þCP". Up to now, 87 different oligopeptides have been isolated from fermented cocoa beans and sequenced by mass spectrometry [3,4]. All these oligopeptides were derived from the 21-kDa seed protein and the cocoa vicilin-class (7S) globulin, respectively [3,4].
From the N-and C-terminal ends of these 87 oligopeptides, 98 putative cleavage sites of the cocoa aspartic protease have been predicted ( Table 2), 23 of which being identical to cleavage sites detected by in-vitro proteolysis (Tables 1 and 2).
To get an insight into the cleavage specificity of the cocoa aspartic protease, the relative abundance of the different amino acid residues in the P4-P4 0 positions around the cleavage sites have been determined (Table 3). This was done both for the cleavage sites putatively used in-situ (during the fermentation process) and for the cleavage sites determined by in-vitro proteolysis (Table 3). In the latter case, all the cleavage sites of the cocoa aspartic protease have been considered, i.e. without discrimination between specific and unspecific cleavage sites as done in Table 1. Considerable differences have been observed for the relative abundance of some amino acids in the P4-P4 0 positions between the in-situ (used during fermentation) and the in-vitro cleavage sites, respectively ( Table 3). Analysis of chemical compounds by MALDI-TOF-MS used for the identification of peptide fragments generated during in-vitro proteolysis [1] is restricted to ions with m/z4799, due to ions generated from the matrix components. As recently reported, most peptides present in fermented cocoa beans, however, have molecular masses below this limit [3,4]. Therefore, considerably more peptides and their corresponding N-and C-terminal ends can be detected and analyzed by LC-ESI-MS/MS than by LC-MALDI-TOF/TOF-MS/MS.

Transparency document. Supporting information
Supplementary data associated with this article can be found in the online version at http://dx.doi. org/10.1016/j.dib.2016.06.021.