Prediction of Bioactive Peptides from Chlorella sorokiniana Proteins Using Proteomic Techniques in Combination with Bioinformatics Analyses

Chlorella is one of the most nutritionally important microalgae with high protein content and can be a good source of potential bioactive peptides. In the current study, isolated proteins from Chlorella sorokiniana were subjected to in silico analysis to predict potential peptides with biological activities. Molecular characteristics of proteins were analyzed by sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE) and proteomics techniques. A total of eight proteins were identified by proteomics techniques from 10 protein bands of the SDS-PAGE. The predictive result by BIOPEP’s profile of bioactive peptides tools suggested that proteins of C. sorokiniana have the highest number of dipeptidyl peptidase-IV (DPP IV) inhibitors, with high occurrence of other bioactive peptides such as angiotensin-I converting enzyme (ACE) inhibitor, glucose uptake stimulant, antioxidant, regulating, anti-amnestic and antithrombotic peptides. In silico analysis of enzymatic hydrolysis revealed that pepsin (pH > 2), bromelain and papain were proteases that can release relatively larger quantity of bioactive peptides. In addition, combinations of different enzymes in hydrolysis were observed to dispense higher numbers of bioactive peptides from proteins compared to using individual proteases. Results suggest the potential of protein isolated from C. sorokiniana could be a source of high value products with pharmaceutical and nutraceutical application potential.


Introduction
Microalgae are eukaryotic unicellular organisms that grow easily with inexpensive substrates. Therefore, they are considered to be economical and effective raw materials in industry [1]. Many studies have been conducted to utilize microalgae as useful products. Most of them majorly focused on the potential of microalgae for biofuel production due to their lipid content and abundant availability [1,2]. However, due to the increasing population and demand for protein, there is a call to further utilize microalgae as protein sources to shift away from animal proteins.
Microalgae have been known as one of many promising alternative plants for proteins as they offer up to 50% (w/w) of protein [3] with a well-balanced amino acid profile required for the nutrition of human beings [1,4]. Several studies have reported the biological activities of various microalgae protein hydrolysates, including immunostimulant and antitumor activities from C. sorokiniana [5,6], angiotensin-I converting enzyme (ACE) inhibitory and hypotensive activities from Chlorella vulgaris [7,8] and Nannochloropsis oculata [9], antioxidant effects from Navicula incerta [10]

Identified Proteins of C. sorokiniana
The protein content of C. sorokiniana isolates was 65.08 ± 0.88% with a yield of 4.40% (w/w initial biomass dry basis). A total of the 10 distinct protein bands of C. sorokiniana proteins were observed in 12% acrylamide gel ( Figure 1). These labeled protein bands (A-J) were used for the in-gel digestion and subsequently analyzed using nanoLC-nanoESI MS/MS. The molecular weights (MWs) of the proteins estimated by SDS-PAGE were 109.02 (A), 72.08 (B), 54.15 (C), 45.72 (D), 38.33 (E), 29.04 (F), 24.96 (G), 21.13 (H), 16.59 (I), and 7.82 (J). Eight protein hits were discovered from the selected bands in the NCBI database namely, chloroplast rubisco activase, 50S ribosomal protein L7/L12 (chloroplast), phosphoglycerate kinase, Fe-superoxide dismutase, heat shock protein 70, ATP synthase subunit beta (chloroplast), elongation factor 2, partial, and V-type H+ ATPase subunit A, partial. The protein hits, accession number from the NCBI database, length of amino acid (AA), and molecular weight of the band reported in NCBI database and estimated by SDS-PAGE were presented in Table 1. The estimated MWs by SDS-PAGE were comparable to the theoretical molecular weights reported in NCBI database except for elongation factor 2, partial and V-type H+ ATPase Subunit A, partial. C. sorokiniana is a freshwater green algae species with high protein content [28]. This species of genus Chlorella was originally called as C. pyrenoidosa [29]. The NCBI database revealed a total of 20,925 proteins from the C. sorokiniana. Most of them are enzymes responsible for various cell functions. According to the nanoLC-nanoESI MS/MS data, eight protein hits from the NCBI database corresponded to the proteins of C. sorokiniana. Phosphoglycerate kinase (Auxenochlorella pyrenoidosa, NCBI accession number: AKP17751.1) was detected in all selected protein bands in the SDS-PAGE. Watson et al. [30] stated that phosphoglycerate kinase derived from various protein sources are all monomers with MWs around 45 kDa. Moreover, their amino acid composition and catalytic functions are similar [31]. The reference molecular weight for phosphoglycerate kinase from NCBI database was 49.13 kDa, thus phosphoglycerate kinase found in band D (45.72 kDa) was used for further analyses. The possibility of having the same single protein in different bands is high since proteins are denatured and separated in SDS-PAGE. In addition, protein hits discovered by the Mascot database were identified based on the matched tryptic peptides detected by mass spectrometry.

Identified Tryptic Peptides from C. sorokiniana Proteins
Tryptic peptides derived from the identified proteins of C. sorokiniana by in-gel digestion were evaluated by nanoLC-nanoESI MS/MS analysis. Those tryptic peptides identified through mass spectrometry were peptides matching with protein hits in the Mascot database. Tryptic peptides are generated by trypsin through in-gel digestion process which is part of proteomics technique. The result of the MS/MS ion search for the tryptic digests revealed that all the tryptic peptides from the identified proteins were doubly and triply charged. Figures 2 and 3 present the representative spectra of the doubly and triply charged peptides from the identified proteins of C. sorokiniana by proteomics analysis. Figure 2 illustrates the doubly charged tryptic peptide of C. sorokiniana of protein band D ( Figure  1), with observed signal m/z 1053.01 marked in red box representing a doubly charged peptide (with adjacent signal difference of 0.50, insert A), and nanoLC-nanoESI MS/MS fragmentation spectra of NFNNIEDGFYISPAFLDK found in chloroplast rubisco activase (NCBI accession no. AEL29575.1) represented in insert B. Figure 3 illustrates the triply charged tryptic peptide also found in the same identified protein of the same protein band D. The observed signal was m/z 618.66, also marked in red box demonstrating a triply charged peptide (with adjacent signal difference of 0.33, insert A), and nanoLC-nanoESI MS/MS fragmentation spectra of tryptic peptide LVDAFPGQSIDFFGALR found in chloroplast rubisco activase (NCBI accession no. AEL29575.1) is illustrated in insert B. C. sorokiniana is a freshwater green algae species with high protein content [28]. This species of genus Chlorella was originally called as C. pyrenoidosa [29]. The NCBI database revealed a total of 20,925 proteins from the C. sorokiniana. Most of them are enzymes responsible for various cell functions. According to the nanoLC-nanoESI MS/MS data, eight protein hits from the NCBI database corresponded to the proteins of C. sorokiniana. Phosphoglycerate kinase (Auxenochlorella pyrenoidosa, NCBI accession number: AKP17751.1) was detected in all selected protein bands in the SDS-PAGE. Watson et al. [30] stated that phosphoglycerate kinase derived from various protein sources are all monomers with MWs around 45 kDa. Moreover, their amino acid composition and catalytic functions are similar [31]. The reference molecular weight for phosphoglycerate kinase from NCBI database was 49.13 kDa, thus phosphoglycerate kinase found in band D (45.72 kDa) was used for further analyses. The possibility of having the same single protein in different bands is high since proteins are denatured and separated in SDS-PAGE. In addition, protein hits discovered by the Mascot database were identified based on the matched tryptic peptides detected by mass spectrometry.

Identified Tryptic Peptides from C. sorokiniana Proteins
Tryptic peptides derived from the identified proteins of C. sorokiniana by in-gel digestion were evaluated by nanoLC-nanoESI MS/MS analysis. Those tryptic peptides identified through mass spectrometry were peptides matching with protein hits in the Mascot database. Tryptic peptides are generated by trypsin through in-gel digestion process which is part of proteomics technique. The result of the MS/MS ion search for the tryptic digests revealed that all the tryptic peptides from the identified proteins were doubly and triply charged. Figures 2 and 3 present the representative spectra of the doubly and triply charged peptides from the identified proteins of C. sorokiniana by proteomics analysis. Figure 2 illustrates the doubly charged tryptic peptide of C. sorokiniana of protein band D ( Figure 1), with observed signal m/z 1053.01 marked in red box representing a doubly charged peptide (with adjacent signal difference of 0.50, insert A), and nanoLC-nanoESI MS/MS fragmentation spectra of NFNNIEDGFYISPAFLDK found in chloroplast rubisco activase (NCBI accession no. AEL29575.1) represented in insert B. Figure 3 illustrates the triply charged tryptic peptide also found in the same identified protein of the same protein band D. The observed signal was m/z 618.66, also marked in red box demonstrating a triply charged peptide (with adjacent signal difference of 0.33, insert A), and nanoLC-nanoESI MS/MS fragmentation spectra of tryptic peptide LVDAFPGQSIDFFGALR found in chloroplast rubisco activase (NCBI accession no. AEL29575.1) is illustrated in insert B.  In the identification of proteins by proteomics analysis, trypsin is usually used to digest proteins in the gel [32]. Trypsin hydrolyzes protein specifically at the C-terminus of the carboxyl side of the amino acids arginine or lysine, but poorly when lysine and arginine are followed by proline. With this perspective, the tryptic peptides are either doubly or triply charged in ESI since the amino terminal residues are basic which explains the result of the MS/MS ion search [33].

Potential Bioactive Peptides from Identified Proteins in C. sorokiniana
Potential bioactive peptides presented in identified proteins of C. sorokiniana were investigated using the BIOPEP-UWM database. Amino acid sequences of six proteins, namely chloroplast rubisco activase, 50s ribosomal protein l7/l12 (chloroplast), phosphoglycerate kinase, Fe-superoxide dismutase, heat shock protein 70 and ATP synthase subunit beta (chloroplast), were chosen as they are found to be relatively abundant components of C. sorokiniana proteins found in SDS-PAGE based on the results. Moreover, they also corresponded to the estimated molecular weights in the NCBI database ( Table 1). The profile of the potential bioactive peptides, their biological activities (ACE inhibitory, antioxidant, anti-amnestic, antithrombotic, stimulating, regulating, DPP IV inhibitory), and frequencies are summarized in Table 2. Results revealed that most of the potential bioactive  In the identification of proteins by proteomics analysis, trypsin is usually used to digest proteins in the gel [32]. Trypsin hydrolyzes protein specifically at the C-terminus of the carboxyl side of the amino acids arginine or lysine, but poorly when lysine and arginine are followed by proline. With this perspective, the tryptic peptides are either doubly or triply charged in ESI since the amino terminal residues are basic which explains the result of the MS/MS ion search [33].

Potential Bioactive Peptides from Identified Proteins in C. sorokiniana
Potential bioactive peptides presented in identified proteins of C. sorokiniana were investigated using the BIOPEP-UWM database. Amino acid sequences of six proteins, namely chloroplast rubisco activase, 50s ribosomal protein l7/l12 (chloroplast), phosphoglycerate kinase, Fe-superoxide dismutase, heat shock protein 70 and ATP synthase subunit beta (chloroplast), were chosen as they are found to be relatively abundant components of C. sorokiniana proteins found in SDS-PAGE based on the results. Moreover, they also corresponded to the estimated molecular weights in the NCBI database ( Table 1). The profile of the potential bioactive peptides, their biological activities (ACE inhibitory, antioxidant, anti-amnestic, antithrombotic, stimulating, regulating, DPP IV inhibitory), and frequencies are summarized in Table 2. Results revealed that most of the potential bioactive In the identification of proteins by proteomics analysis, trypsin is usually used to digest proteins in the gel [32]. Trypsin hydrolyzes protein specifically at the C-terminus of the carboxyl side of the amino acids arginine or lysine, but poorly when lysine and arginine are followed by proline. With this perspective, the tryptic peptides are either doubly or triply charged in ESI since the amino terminal residues are basic which explains the result of the MS/MS ion search [33].

Potential Bioactive Peptides from Identified Proteins in C. sorokiniana
Potential bioactive peptides presented in identified proteins of C. sorokiniana were investigated using the BIOPEP-UWM database. Amino acid sequences of six proteins, namely chloroplast rubisco activase, 50s ribosomal protein l7/l12 (chloroplast), phosphoglycerate kinase, Fe-superoxide dismutase, heat shock protein 70 and ATP synthase subunit beta (chloroplast), were chosen as they are found to be relatively abundant components of C. sorokiniana proteins found in SDS-PAGE based on the results. Moreover, they also corresponded to the estimated molecular weights in the NCBI database ( Table 1). The profile of the potential bioactive peptides, their biological activities (ACE inhibitory, antioxidant, anti-amnestic, antithrombotic, stimulating, regulating, DPP IV inhibitory), and frequencies are summarized in Table 2. Results revealed that most of the potential bioactive peptides were dipeptides or tripeptides with multiple biological activities. The number of those bioactive peptides was identified based on the amino acid sequences which were predicted to become potential bioactive peptides. The BIOPEP database displays peptides with their bioactivities from inputted protein sequences corresponding to the information in the database.  On the other hand, Figure 5 shows the molecular weights of the tryptic peptides in phosphoglycerate kinase which corresponds to the theoretical tryptic peptides at amino acid positions 232-244, 258-268, 285-298, 303-313, 383-411, and 436-465 (matched tryptic peptides shown in red letters). There were 297 DPP IV inhibitor, 224 ACE inhibitor, 23 antioxidant, 33 stimulant, 5 anti-amnetic, 4 antithrombotic, and 6 regulatory peptides embedded in the amino acid sequence of phosphoglycerate kinase. Moreover, the profiles of the bioactive peptides of 50s ribosomal protein l7/l12 (chloroplast), Fe-superoxide dismutase, heat shock protein 70 and ATP synthase subunit beta (chloroplast) also show the presence of the above mentioned bioactive peptides in these proteins, except that Fe-superoxide dismutase does not show anti-amnestic, antithrombotic, or regulating peptides. In all the proteins, DPP IV and ACE inhibitors were the most abundant bioactive peptides.
The amino acid composition and sequence of the proteins greatly determines the presence of these bioactive peptides. Results also revealed that most of the DPP IV peptides present in the identified proteins had proline (P), alanine (A), glycine (G), valine (V) and leucine (L) amino acid residues. DPP IV preferably cleaves dipeptides with proline and alanine residues at the N-terminal side of the peptide [34]. It also has relatively lower cleavage rates with serine, glycine, leucine, and valine [35,36]. Moreover, the presence of basic and hydrophobic amino acids at the N-terminal side of the peptides could enhance the cleavage susceptibility of the substrate [34,37]. DPP IV inhibitors were also reported from various protein sources by in silico approach including barley, canola, oat, soybean, wheat, quinoa, chicken egg, bovine milk, bovine meat, pig, tuna, Atlantic salmon, chum salmon, tilapia skin and frame, and palmaria palmate [21,27,[38][39][40]. On the other hand, the abundance of ACE inhibitory peptides in the identified proteins might have also been influenced by the amino acid compositions of the proteins. The presence of amino acid residues such as phenylalanine (F), tyrosine (Y), tryptophan (W), or proline (P) in at the C-terminal side of the peptides have been reported to exhibit high potent ACE inhibitory activity [41][42][43][44][45]. The adjacent amino acid residue of proline can also influence the potency of the ACE inhibitor, which is usually enhanced by hydrophobic amino acids [46]. In silico analysis of different proteins revealed the abundance of ACE inhibitors embedded in various protein sequences [47][48][49]. In previous studies, some amino acid sequences of ACE inhibitory peptides from C. sorokiniana were discovered. Lin et al. [14] reported IC 50 values of WV, VW, IW, and LW were 307.61, 0.58, 0.50, and 1.11 µM, respectively. Moreover, C. sorokiniana protein hydrolysates could reduce systolic and diastolic blood pressure at 20 and 21 mm Hg, respectively. Suetsuna and Chen [8] also mentioned several amino acid sequences generated potential antihypertensive activity through oral administration, such as IVVE (IC 50  these bioactive peptides. Results also revealed that most of the DPP IV peptides present in the 185 identified proteins had proline (P), alanine (A), glycine (G), valine (V) and leucine (L) amino acid 186 residues. DPP IV preferably cleaves dipeptides with proline and alanine residues at the N-terminal 187 side of the peptide [39]. It also has relatively lower cleavage rates with serine, glycine, leucine, and 188 valine [40,41]. Moreover, the presence of basic and hydrophobic amino acids at the N-terminal side 189 of the peptides could enhance the cleavage susceptibility of the substrate [39,42]. DPP IV inhibitors 190 were also reported from various protein sources by in silico approach including barley, canola, oat, 191 soybean, wheat, quinoa, chicken egg, bovine milk, bovine meat, pig, tuna, atlantic salmon, chum 192 salmon, tilapia skin and frame, and palamaria palmate [26,32,[43][44][45]. On the other hand, the 193 abundance of ACE inhibitory peptides in the identified proteins might have also been influenced by 194 the amino acid compositions of the proteins. The presence of amino acid residues such as 195 phenylalanine (F), tyrosine (Y), tryptophan (W), or proline (P) in at the C-terminal side of the peptides 196 have been reported to exhibit high potent ACE inhibitory activity [46][47][48][49][50]. The adjacent amino acid 197 residue of proline can also influence the potency of the ACE inhibitor, which is usually enhanced by 198 hydrophobic amino acids [51]. In silico analysis of different proteins revealed the abundance of ACE 199 inhibitors embedded in various protein sequences [52][53][54]. In previous study, some amino acid  For many years now, in silico analysis has been successfully used to predict the potential application of various proteins as a source of bioactive peptides [22]. It provides sufficient information for determining the potential biological activity of proteins which is much faster than conventional methods [21]. The results of the in silico analysis by BIOPEP-UWM suggest the potential of C. sorokiniana proteins for pharmaceutical application as demonstrated by its bioactivities. These peptides in the intact proteins are inactive and need to be released in order to perform their functions [50]. The prediction of the potential bioactivities of the proteins after digestion by various proteases can be conducted by the BIOPEP-UWM database tool.

209
For many years now, in silico analysis has been successfully used to predict the potential 210 application of various proteins as source bioactive peptides [27]. It provides sufficient information 211 for determining the potential biological activity of proteins which is way faster than conventional 212 methods [26]. The results of the in silico analysis by BIOPEP-UWM suggests the potential of C.

Prediction of Potential Bioactive Peptides after Protease Cleavage using BIOPEP-UWM Tool
Identified proteins such as chloroplast rubisco activase, phosphoglycerate kinase, Fe-superoxide dismutase, heat shock protein 70 and ATP synthase subunit beta (chloroplast) were further analyzed using the "enzyme action" tool in BIOPEP-UWM database; these proteins showed the most numbers of bioactivities from their profiles of bioactive peptides ( Table 2). Results of the 15 simulations of enzymatic hydrolysis for each protein sequence are presented in Table 3. The table shows the number of bioactive peptides with specific bioactivities after hydrolysis of the individual proteins by various proteases. The results revealed that DPP IV inhibitory peptides were observed to be dominantly produced from the selected proteins using different proteases. ACE inhibitory peptides were also released in relatively high numbers but lower than DPP IV. This information is in concurrence with the profile of the potential bioactive peptides from the proteins in Table 2. Bromelain, papain, ficin and pepsin (pH > 2) were the individual proteases that released the most diverse and large number peptides with certain biological activities from all the selected proteins. Meanwhile, trypsin had the lowest number of bioactive peptides release after in silico hydrolysis. Trypsin is the most commonly used enzyme in proteomics approach [32], however, in the in silico analysis, it did not release significant numbers of potential bioactive peptides. Nonetheless, based on the results, other single action enzymes could also release relatively high numbers of bioactive peptides. The use of a combination of enzymes in hydrolysis is also offered by the BIOPEP-UWM database. A combination of two to a maximum of three enzymes could be utilized in the hydrolysis simulation of the proteins. The combination of three enzymes (trypsin, α-chymotrypsin, and pepsin) had been identified to produce potential anti-inflammatory peptides in microalgae, such as LDAVNR and MMLDF [12]. Table 3 reveals that the use of combined action of two to three enzymes could actually lead to the release of higher numbers of bioactive peptides from the selected proteins. This implies a greater effectiveness of using the combined action of enzymes in cleaving peptide bonds than the single action enzyme, except for pepsin which has almost the same number of released peptides with the combined enzyme action. Pepsin has been reported from several in vitro studies to produce various bioactive peptides from microalgae hydrolysates such as an ACE inhibitor and antioxidant peptides. Samarakoon et al. [9] mentioned that pepsin generated more potential ACE inhibitory peptides compared to other proteases, such as GMNNLTP (IC 50 : 123 µM) and LEQ (IC 50 : 173 µM). Ko et al. [11] also identified LNGDVW from peptic hydrolysates possessed strong scavenged peroxyl, DPPH and hydroxyl radicals at the IC 50 values of 0.02, 0.92 and 1.42 mM, respectively. Moreover, peptic hydrolysates from microalgae efficiently generated strong antioxidant activities [51,52].
Furthermore, in comparison to the other three proteins, ATP synthase subunit beta demonstrated higher tendency to release more bioactive peptides using the different proteases. However, these theoretically produced bioactive peptides may not always have a comparable function with the in vitro and in vivo analyses, thus further study of these peptides using in vitro and in vivo studies should be conducted. Nevertheless, the BIOPEP's "enzyme action" tool was able to provide reference information on the possible bioactive peptides that could be released from the selected proteins using various proteases. Table 3. Number of predicted potential bioactive peptides to be released from identified proteins of C. sorokiniana using BIOPEP's "enzyme action" tool.

Materials
The microalgae, C. sorokiniana was obtained from the Taiwan Chlorella Manufacturing Co., Ltd. (Taipei, Taiwan), considered as the largest producer of Chlorella every year with an average production of 400 tons of dried biomass [1]. All reagents and chemicals used were analytical grade.

Protein Isolation
The protein isolation process was adapted from the procedure of Parimi et al. [53] with modifications. Briefly, C. sorokiniana biomass slurry at 1:16 (w/v) ratio was prepared. Sonication for 1 h was done to the slurry for pretreatment and subsequent alkaline protein extraction by solubilization at 11.38 using 2 M NaOH for 35 min with stirring. It was followed by isoelectric precipitation of the supernatant at 4.01 with 1M HCL and stirred for 60 min. Centrifugation at 8750× g for 35 min was done for the solid-liquid separation during the solubilization and precipitation steps. The protein isolate was lyophilized and stored at −20 • C until further use. The modified Lowry method [54] was used to determine the protein content of the isolate.

SDS-PAGE Analysis
The SDS-PAGE was performed according to a method described by Schägger and Von Jagow [55] 4% stacking gel (w/v) and 12% polyacrylamide gel (w/v). 10 milligrams of protein isolate was dissolved in 1 mL of denaturant sample buffer (0.5 M Tris-HCl pH 6.8, glycerol, 10% SDS, w/v, 0.5% bromophenol blue, w/v, β-mercaptoethanol), and heated at 95 • C. Then, 10 µL of the sample was loaded to the sample wells. Protein separation was carried at 80 V for 30 min followed by 110 V for 90 min for the resolving gel using a Mini Protean II unit (Bio-Rad Laboratories, Hercules, CA, USA). The gel was stained for 40 min with Brilliant Blue (Bio-Rad, Coomassie R250). Destaining of the gel was done three times using water/methanol/acetic acid (7/2/1, v/v/v) for 15 min each cycle with shaking using an orbital shaker (Fristek S10, Taichung city, Taiwan). Estimation of the molecular mass of proteins was done using molecular protein mass marker (250 to 10 kDa, Bio-Rad) loaded at 5 uL in the sample well. The gels was scanned with E-Box VX5 (Vilber Lourmat, Paris, France) and the analysis of the captured image was done using Vision Capt software (V16.08a, Vilber Lourmat, Paris, France).

In-Gel Tryptic Digestion
The following proteomics technique experiments were carried out in Academia Sinica, Nangang District, Taipei City, Taiwan. Proteomics techniques were adapted from the methods described by Chang et al. [18]. Gel slice and in-gel digestion were performed using the combined modified methods of Rosenfeld et al. [56] and Shevchenko et al. [32]. Briefly, 10 intensive colored protein bands were excised from the SDS-PAGE gel for the in-gel digestion. The gel pieces were destained with 25 mM amonium bicarbonate (ABC)/ 50% acetonitrile (ACN) solution in a microcentrifuge PP tubes. The destained gel pieces were added with 100 µL of 50 mM dithioerythreitol (DTE) / 25 mM ABC and soaked at 37 • C for 1 h. The tubes were centrifuged and the DTE solution was removed. Then, the gel pieces were added with 100 µL of 100 mM iodoacetamide (IAM) / 25 mM ABC and soaked at room temperature in a dark place for 1 h for the alkylation step. The IAM solution was removed after centrifugation. Washing of the gel pieces was done by soaking in 200 µL of 50% ACN / 25 mM ABC for 15 min. The solution was removed after centrifugation and the process was repeated four times. The gel slices were then soaked in 100 µL of 100% ACN for 5 min, repeated twice, and the solution was discarded after centrifugation. The gel slices were dried for 5 min using Speed Vac (Thermo Scientific, Waltham, MA, USA). Trypsin digestion followed by adding Lys-C / 25 mM ABC (enzyme:protein, 1:50) and incubating the mixture for 1 h at 37 • C. Afterwards, the same amount of trypsin was added and incubated for 16 h at 37 • C. Afterwards, the extraction of the tryptic peptides was done with 50 µL of 50% ACN/ 5% trifluoroacetic acid (TFA). The peptide extracts were transferred to new tubes and dried Speed Vac (Thermo Scientific, Waltham, MA, USA). Finally, the peptide extracts were purified using C18 Zip-Tip. The purified peptide extracts were used for the nanoLC-nanoESI MS/MS analysis.

Nanoliquid Chromatography-Nanoelectrospray Ionization Tandem Mass Spectrometry (NanoLC-nanoESI MS/MS) Analysis
Dried tryptic peptide digest was subjected to nanoLC−nanoESI MS/MS analysis using a nanoAcquity system (Waters, Milford, MA, USA) connected to the LTQ Orbitrap Velos hybrid mass spectrometer (Thermo Electron, Bremen, Germany) equipped with a PicoView nanospray interface (New Objective, Woburn, MA). The tryptic peptide mixtures were loaded onto a 75 µm ID, 25 cm length C18 BEH column (Waters, Milford, MA) packed with 1.7 µm particles with a pore width of 130 Å. Separation was performed using a segmented gradient in 60 min from 5 to 35% solvent B (acetonitrile with 0.1% formic acid) at 300 nL/min flow rate and at 35 • C column temperature. Solvent A was 0.1% formic acid in water (v/v). The mass spectrometer was operated in the data-dependent mode. In brief, the orbitrap (m/z 350-1600) with the resolution set to 60 K at m/z 400 and automatic gain control (AGC) target at 10 6 was used to obtain the survey full scan MS spectra. The 20 most intense ions were sequentially isolated for collision-induced dissociation (CID) MS/MS fragmentation and detection in the linear ion trap (AGC target at 10,000) with previously selected ions dynamically excluded for 60 s. Ions with singly and unrecognized charge state were also excluded. The LTQ-Orbitrap data were acquired at the Academia Sinica Common Mass Spectrometry Facilities located at the Institute of Biological Chemistry, Academia Sinica, Nangang District, Taipei City, Taiwan.
The Mascot ion score was −10*Log (P), where P is the probability that the observed match is a random event. Individual ion scores of N 45 indicated identity or extensive homology (p < 0.05). Protein scores were derived from ion scores as a non-probabilistic basis for ranking protein hits (Matrix Science, London, United Kingdom). The sequence coverage of protein hits was expressed in percentage (%) indicating the sequence homology of identified tryptic peptides from C. sorokiniana to corresponding protein hits based on the Mascot MS/MS ion search results [21].

In Silico Analysis of Bioactive Peptides and Enzyme Cleavages using BIOPEP-UWM Database Tools
Sequences of the identified protein of C. sorokiniana proteins from NCBI database were analyzed for bioactive peptides and enzyme cleavages using BIOPEP-UWM database (http://www.uwm.edu. pl/biochemia/index.php/pl/biopep) accessed on March 15, 2018 [23] performed as described by Cheung et al. [25] with modifications. Briefly, the bioactivities, sequences, number and location of the peptides were obtained from the sequences of the identified proteins analyzed using the "profiles of potential bioactivity" tool. Moreover, the sequences of the identified proteins were examined using the "enzyme action" tool to simulate enzymatic hydrolysis. A total of 15 enzymatic hydrolysis simulations (composed of 12 individual proteases, one double enzyme action, and two triple enzyme action) were conducted to each protein sequence. A list of all the potential bioactive peptides was obtained after directing the theoretical peptide sequence data to the "search for active fragments" option. The occurrence of the frequency of the bioactive peptides in the intact proteins was computed as A = a/N, where A is occurrence frequency, a is the number of bioactive peptides and N is the total number of amino acid residues in the protein sequence.

Conclusions
Proteomics techniques coupled with in silico analysis used in this study showed a rapid method to identify the isolated proteins of C. sorokiniana, to predict potential bioactivities and to determine the appropriate proteases that theoretically released more bioactive peptides. Results of the proteomics technique showed the identification of tryptic peptides corresponding to eight proteins from the microalgae. The in silico analysis using BIOPEP-UWM database tools revealed that the combined actions of mixed enzymes and the use of single enzyme action of pepsin (pH > 2) could lead to the production of more diverse and larger numbers of potential bioactive peptides embedded in the protein sequences. According to the results, C. sorokiniana proteins are potential sources of bioactive peptides with various bioactivities. Nonetheless, with the use of appropriate extraction methods and purification techniques for certain predicted bioactivities, these proteins could be a good alternative source of high value compounds for pharmaceutical, medical, cosmetics and functional food applications to aid in human health maintenance and enhancement.