Characterization and differentiation of quinoa seed proteomes by label-free mass spectrometry-based shotgun proteomics

Quinoa seed proteins are of prime importance in human nutrition and in plant breeding for cultivar identification and improvement. In this study, proteins from seeds of black, red, white quinoa from Peru and white quinoa from Bolivia (also known as royal) were extracted, digested and analyzed by nano-liquid chromatography coupled to Orbitrap tandem mass spectrometry (LC-MS/MS). The raw mass spectra data were processed for identification and label-free quantification (LFQ) using MaxQuant/Andromeda against a specific quinoa database from The National Center for Biotechnology Information (NCBI). In total, 1,211 quinoa proteins (85 were uncharacterized) were identified. Inspection and visualization using Venn diagrams, heat maps and Gene Ontology (GO) graphs revealed proteome similarities and differences between the four varieties. The presented data provides the most comprehensive experimental quinoa seed proteome map existing to date in the literature, as a starting point for more specific characterization and nutritional studies of quinoa and quinoa-containing foodstuff.

Despite the importance of quinoa proteins in human nutrition, biomedicine, cultivar identification, quality control and authentication, studies focused on the characterization of the quinoa seed proteome are scarce. Until recent years, two-dimensional gel electrophoresis (2D-GE) followed by matrix assisted laser desorption ionization-mass spectrometry (MALDI-MS) has been the most applied technique for visualizing proteome profiles of quinoa seeds (Aloisi et al., 2016). However, 2D-GE possesses several limitations, including a weak ability to detect lowabundance proteins, narrow linear detection dynamic range, long analysis time, low sensitivity and reproducibility. Nowadays, liquid chromatography-tandem mass spectrometry (LC-MS/MS) is the method of choice for shotgun proteomics (Aizat & Hassan, 2018;Gao & Yates, 2019;Schubert, Röst, Collins, Rosenberger, & Aebersold, 2017). This approach provides an indirect measurement of proteins through the bottom-up LC-MS/MS analysis of peptides derived from proteolytic digestion of the original complex protein mixture. So far, only two studies have reported the shotgun proteome analysis of quinoa seeds (Burrieza, Rizzo, Moura, Silveira, & Maldonado, 2019;Capriotti et al., 2015), thereby limiting the current understanding of the quinoa proteome and pointing out that its comprehensive characterization is not so straightforward.
One of the main challenges in shotgun large-scale proteomics when working with non-model plant species such as quinoa is the limited availability of protein sequences in the commonly used databases (Heck & Neely, 2020). In 2015, due to the lack of complete quinoa genomic sequencing and proteomic data, Capriotti et al. (Capriotti et al., 2015) were forced to use a protein sequence database from plants of the Caryophillales order, which included a very small number of quinoa proteins. In that study, only 4 quinoa proteins were identified, whereas the rest of proteins (3 4 8) were found by sequence homology to plant species phylogenetically close to quinoa, especially beetroot (Beta vulgaris) and spinach (Spinacia oleracea). The publication of the quinoa genome in 2017 (Jarvis et al., 2017;Zou et al., 2017) provided an important breakthrough for the experimental characterization of the quinoa seed proteome. Taking advantage of this novel information, in 2019,  identified a total of 337 quinoa proteins, including novel lysine-rich seed storage globulins.
Indeed, a critical point when performing shotgun proteomics is the selection of the most suitable database for each particular case (Xu, 2012). UniProt/SwissProt database (https://www.uniprot.org) is the most widely used protein sequence database, especially for studying model organisms (Bateman, 2019). The second most employed database is The National Center for Biotechnology Information (NCBI, http://www.ncbi.nlm.nih.gov). The main advantage of the NCBI database is that it works with the Reference Sequence (RefSeq) project (O'Leary et al., 2016), which, contrary to Uniprot, maintains and curates new and updated genome annotations. At the time of writing this article, for the particular case of quinoa, the Uniprot database (unreviewed section) contains 232 Chenopodium quinoa protein entries, whereas the RefSeq NCBI database contains a total number of 63,370. This fact suggests the need for large-scale proteomics studies to experimentally confirm and review the data provided for quinoa by the different databases.
In this study, a label-free LC-MS/MS shotgun proteomics approach with a state-of-the-art Orbitrap mass spectrometer was applied to comprehensively characterize the proteome of the most typical quinoa seed varieties, i.e. black (B), red (R), white quinoa from Peru (W) and white quinoa from Bolivia (also referred to as royal, RO). The most appropriate conditions (regarding sample preparation, instrumentation and data analysis) were carefully selected, significantly improving the quinoa proteome coverage from previous studies and revealing differences between varieties. The proposed methodology provides the most comprehensive experimental quinoa seed proteome map existing to date in the literature, as a tool for more specific characterization and nutritional studies.

Sample preparation
Black (B), red (R), white (W) quinoa from Peru, as well as royal white (RO) quinoa from Bolivia were acquired in local supermarkets from Barcelona. Quinoa grains were dried in an air-current oven at 40 • C for 24 h, ground in a coffee grinder and stored at room temperature (RT) in a desiccator. Quinoa proteins were extracted as in our previous work (Galindo-Luján, Pont, Sanz-Nebot, & Benavente, 2021): 250 mg of the ground sample were mixed with 1 mL of water and 39 µL of 1 M NaOH (final pH was 10.0) using a vortex Genius 3 (Ika®, Staufen, Germany) and then incubated for 1 h at 36 • C with constant shaking in a TS-100 thermoshaker (Biosan, Riga, Latvian Republic). Separation of soluble proteins from the insoluble residue was performed by centrifugation at 15,000 × g for 20 min at 4 • C in a cooled Rotanta 460 centrifuge (Hettich Zentrifugen, Tuttlingen, Germany). For protein purification, the supernatant pH was adjusted with 22 µL of 1 M HCl to obtain a final pH value of 5.0. After centrifugation at 15,000 × g for 20 min at 4 • C, precipitated proteins were resuspended in 1 mL of a solution consisting of 60 mM H 3 BO 3 (pH adjusted to 9.0 with NaOH). All pH measurements were made with a Crison 2002 potentiometer and a Crison electrode 52-03 (Crison Instruments, Barcelona, Spain).

CE-UV
A capillary electrophoresis (CE) instrument with a diode-array detector (7100 CE, Agilent Technologies, Waldbronn, Germany) was used to estimate the total amount of protein in the quinoa extracts measuring the absorbance at 280 nm. A calibration curve was established by analyzing BSA standard solutions at concentrations between 50 and 1,000 µg/mL. BSA standards and quinoa protein extracts (three independent replicates for B, R, W and RO quinoa) were injected for 10 s at 50 mbar in a 58 cm total length (L T ) × 50 μm internal diameter (i.d.) × 365 μm outer diameter (o.d.) fused silica capillary (Polymicro Technologies, Phoenix, AZ, USA). Flow injection experiments were performed without voltage, mobilizing the plug of sample by applying 50 mbar of pressure after the injection. Absorbance was measured from the height of the detected protein peaks. The total protein contents estimated by CE-UV in the quinoa extracts (average of the three analyzed replicates, relative standard deviation (%RSD) lower than 5% in all cases) were 2.2% (m/m), 2.3% (m/m), 1.7% (m/m) and 2.6% (m/m) for B, R, W and RO quinoa samples, respectively.

SDS-PAGE
Fifteen microliters of sample solution (protein extracts from RO quinoa) were reduced and denatured with 5 µL of reducing-Laemmli sample buffer (1 M Tris titrated by HCl to pH 6.8, 4% (m/v) SDS, 20% (v/v) glycerol, 10% (v/v) β-mercaptoethanol and 1% (v/v) bromophenol blue). SDS-PAGE was performed on a vertical system Mini-PRO-TEAN® Tetra Cell with a PowerPac™ HC Power Supply (Bio-Rad, Hercules, CA, USA). Samples were loaded using in-house 10% SDSpolyacrylamide gels (30% (v/v) acrylamide/bis solution, 1.5 M Tris titrated by HCl to pH 8.8, 10% (m/v) SDS, 10% (m/v) APS and TEMED (4 µL/10 mL)). Ten microliters of protein ladder (BenchMark™ Protein Ladder) were also loaded for molecular mass calibration. Gel electrophoresis was performed at 120 V for 2 h at RT. The gel was fixed in 40% (v/v) ethanol and 10% (v/v) HAc for 30 min. Then, it was rinsed with water (3 × 5 min) and incubated in a blue Coomassie staining solution for 24 h. After that, the gel was rinsed with water (3 × 20 min) until a proper degree of staining was achieved.

LC-MS/MS
Appropriate volumes (from 85 to 130 µL, which is to say, 50 µg of total protein estimated by CE-UV) of the quinoa protein extracts (three independent replicates from B, R, W and RO quinoa) were evaporated to dryness using a Savant SPD-111 V SpeedVac concentrator (Thermo-Fisher Scientific) and suspended in 100 μL of ice-cold extraction buffer (25 mM HEPES (pH 8.0), 1.5 M urea, 0.02% (v/v) Triton™ X-100 and 5% (v/v) glycerol). The suspension was vortexed for 2 min and centrifuged for 30 s at 5,000 × g. Samples were reduced by addition of 3 mM TCEP for 45 min at RT and then alkylated with 15 mM IAA for 60 min in the dark at RT. Proteolytic digestion was performed by addition of 300 ng of trypsin/Lys-C mix (enzyme:protein ratio 1:167 m/m) and incubated under shaking at 500 × g at RT overnight. The digestion was stopped by addition of FA (1% (v/v) final concentration) and centrifuged at 15,000 × g for 2 min. The supernatant containing the digested proteins was desalted on disposable TopTip C-18 columns (Glygen, Columbia, MD, USA) and was evaporated to dryness.
All experiments were performed on an Orbitrap Fusion™ Lumos™ (Thermo Scientific) coupled to an Ultimate3000 nanoRLSC (Thermo Scientific). Protein digests were reconstituted in 20 μL of water containing 1% of FA (v/v) and separated on a column (15 cm × 75 μm ID × 365 μm OD fused silica capillary, Polymicro Technologies) in-house packed with C18 particles (Luna C18 (2) 96-99 min, 98-2% ACN; 99-109 min, 2-2% ACN). Two μL of sample were injected. The Orbitrap parameters in ESI + were as follows: ion source temperature 250 • C, ion spray voltage 2.1 kV, top speed mode, full-scan MS spectra (m/z 350-2,000) acquired at a resolution of 60,000. Precursor ions were filtered according to monoisotopic precursor selection, charge state (+2 to + 7), and dynamic exclusion (30 s with a ± 10 ppm window). The automatic gain control settings were 5*10 5 for full scan and 1*10 4 for MS/MS scans. Fragmentation was performed with collision-induced dissociation (CID) in the linear ion trap. Precursors were isolated using a 2 m/z isolation window and fragmented with a normalized collision energy of 35%.

Data analysis
MaxQuant (Thermo Scientific, version v1.6.17.0) (Cox & Mann, 2008) with the search engine Andromeda (Cox et al., 2011) was applied for protein and peptide identification for all MS raw files (three independent replicates from B, R, W and RO quinoa). Enzymatic digestion with trypsin was selected, together with a maximum of two missed cleavages, peptide charges from + 2 to + 7, a precursor mass tolerance of 10 ppm and a fragment mass tolerance of 0.5 Da. Search parameters were set to allow for dynamic modifications of methionine oxidation, acetyl on N-terminus, and fixed cysteine carbamidomethylation. The search database consisted of a non-redundant quinoa protein sequence FASTA file containing the 63,370 entries from Chenopodium quinoa found in RefSeq NCBI database (FASTA file is provided as Supplementary material). The false discovery rate (FDR) was set to 0.01 for both peptide and protein identifications. Normalized label-free quantification (LFQ) values were obtained by applying the in-built MaxLFQ algorithm (Cox et al., 2014).
MaxQuant normalized LFQ intensities of identified proteins in all quinoa varieties were visualized as a heat map, created using the freely available web server Heatmappper (http://www.heatmapper.ca). The identified proteins were also classified by Gene Ontology (GO) using the PANTHER classification system (http://www.pantherdb.org). However, as Chenopodium quinoa is not available in the PANTHER-GO system, which works primarily with UniProt identifiers and modeled organisms (Mi, Muruganujan, Casagrande, & Thomas, 2013), the NCBI accession numbers (IDs) of the identified proteins were blasted (Pertsemlidis & Fondon, 2001) against the Uniprot database of Arabidopsis thaliana, a widely recognized model plant organism.

SDS-PAGE analysis
Before shotgun proteomics experiments, SDS-PAGE was used to investigate the protein banding patterns obtained for quinoa seed extracts with three different methods of protein precipitation (Aloisi et al., 2016;Capriotti et al., 2015;Galindo-Luján et al., 2021). Using a RO quinoa sample as a model, precipitation with acetone, alkaline extraction with NaOH and alkaline extraction with NaOH followed by isoelectric precipitation with HCl were tested. As shown on the gel image in Fig. 1 A-C, the protein profiles were similar with all the extraction procedures and proteins were resolved into distinct bands that spanned a broad range of relative molecular masses (M r ) from 6,000 to 65,000. These results are in consensus with previous studies, such as those reported by Piñuel et al. (Piñuel et al., 2019), which assigned the most abundant bands around 10,000, 15,000-35,000 and 50,000 M r to 2S albumins, 11S globulins and 7S globulins, respectively. Since there were no significant differences in the number of protein bands observed for the three extraction protocols (only acetone precipitation produced seemingly lower intensities, see Fig. 1-A) and also considering that the alkaline extraction followed by isoelectric precipitation was supposed to provide the most purified protein extracts, this extraction procedure was applied before performing shotgun proteomics.

LC-MS/MS analysis
The most commonly used approach for label-free shotgun proteomics is to enzymatically digest proteins into peptides, which are then analyzed by LC-MS/MS. Therefore, efficient protein digestion is critical to the successful identification of proteins. Typically, the solubilization of proteins using buffers that contain salts, denaturing agents and detergents (e.g. urea and Triton™ X-100, among others) and the efficient elimination of these additives from the resulting samples are crucial steps in shotgun proteomics (Tubaon, Haddad, & Quirino, 2017).
Other critical consideration for the successful identification of proteins is the use of a high-performance mass spectrometer. In this study, the Orbitrap Fusion™ Lumos™ mass spectrometer was used for the characterization of the quinoa seed proteome. This mass spectrometer has significantly boosted the sensitivity and sequencing speed, compared to earlier generation Orbitraps (e.g. Linear Trap Quadrupole-Orbitrap), leading to an increase in protein identifications (Zhu et al., 2018). LC-MS/MS conditions normally used for the analysis of peptides in shotgun proteomics were selected, as they are general enough to allow a proper chromatographic separation and detection sensitivity in positive ESI mode for a wide range of complex protein digests using different mass spectrometers. Under these conditions, three independent protein extracts from B, R, W and RO quinoa were analyzed by LC-MS/ MS and the raw data files were subjected to data analysis for peptide and protein identification.

Data analysis
Data analysis in shotgun proteomics is much more challenging than for other high-throughput technologies and remains a principal bottleneck in proteomics (Lereim, Oveland, Berven, Vaudel, & Barsnes, 2016;Nesvizhskii, Vitek, & Aebersold, 2007;Sinitcyn, Rudolph, & Cox, 2018). Due to the high complexity of the samples resulting from the enzymatic digestion of proteins, computational proteomics has become a key research area for protein identification and quantification. MaxQuant is a computational proteomics workflow that addresses the above tasks with a focus on high accuracy and quantitative data (Cox et al., 2011(Cox et al., , 2014Cox & Mann, 2008).
In this study, MaxQuant/Andromeda in combination with a nonredundant quinoa protein sequence FASTA file (containing 63,370 entries) from the RefSeq NCBI database was applied for protein  identification and LFQ of all the raw data files. Supplementary Table S-1 shows the protein group level, the accession number (ID), the protein name, the M r , the Andromeda score, the number of peptides, the sequence coverage and the normalized LFQ intensity for the identified proteins in W, B, R and RO samples (definition of these parameters is provided as a footnote in Supplementary Table S-1). It is worth mentioning that for every quinoa variety, only proteins found in at least two out of the three replicates are reported. Additionally, the number of peptides, the sequence coverage and the normalized LFQ intensity obtained for all the quinoa varieties is presented as an average value for the different protein extract samples (in all cases, relative standard deviation (%RSD) was lower than 10%). As shown in Supplementary Table S-1, a total number of 1,211 proteins (taking into account all the quinoa varieties) could be identified using the described label-free shotgun proteomics approach. This is a significantly larger number of proteins than the 337 reported by , who were using a different protein extraction protocol, mass spectrometer and data analysis approach with a Chenopodium quinoa protein NCBI database of 63,459 entries. In Supplementary Table S-1, proteins are ordered by the Andromeda score that is considered the most important parameter to reflect the reliability of the identification in this environment (Cox et al., 2011;Cox & Mann, 2008). As can be observed, the Andromeda score for the identified proteins ranged between 323 and 2, being more reliable proteins with higher Andromeda scores than those with lower scores. In addition, in certain protein group levels, there are several proteins that cannot be distinguished based on their peptide content, for instance proteins in the group level 3 (with IDs XP_021715439.1 and XP_021720768.1, see Supplementary Table S-1). MaxQuant identification and quantification is reported at the group level to avoid overcounting identification and ambiguous Fig. 3. Heat map obtained using the row z-score normalized LFQ intensities of the identified quinoa proteins in the four analyzed varieties, W, B, R and RO.
quantification. The Venn diagram in Fig. 2 shows the relationships between the identified proteins for W, B, R and RO quinoa. As can be seen in this figure, a similar total number of proteins were identified in W, B, R and RO quinoa varieties (i.e 1,073, 997, 982 and 964, respectively, from a total of 1,211 proteins identified considering the four varieties). Among them, 805 proteins (66% of the total) were identified in all the varieties, while 406 proteins (34% of the total) were only present in some of them. Regarding proteins identified in only one variety, 88 proteins were exclusively identified in W, 30 in B, 21 in R and 17 in RO quinoa (see Fig. 2 and Supplementary Table S-1). These observations suggested that there were differences between the proteomes of the four studied quinoa varieties.
Although the Venn diagram allowed visualizing general relationships in the number of proteins identified in the quinoa varieties, it was necessary to consider differences at the concentration level for a confident discrimination. A Euclidean distance heat map graph (Fig. 3) was constructed from the data matrix of average normalized LFQ intensities of the 1,211 identified proteins (rows) in the four quinoa varieties (columns). Proteins were filtered for complete observations (805 proteins), and z-scores (normalized per protein) were calculated by substracting the mean and dividing by the standard deviation values. In a heat map, the rows and columns are reordered to keep closer those with similar profiles and each row z-score entry in the data matrix is displayed as a color, making it possible to view the relationships and patterns graphically (Benno Haarman et al., 2015;Key, 2012;Krentzman, Robinson, Jester, & Perron, 2011). As can be observed in Fig. 3, each variety presented a characteristic concentration profile, with green, red and black boxes representing up-regulated, down-regulated and unchanged expression proteins, respectively. Most heat maps use an agglomerative hierarchical clustering algorithm to group the data according to the observed characteristic profiles, and display this information using a dendrogram. When two clusters are merged, a line is drawn connecting the two clusters at a height corresponding to how similar the clusters are. As it is shown in Fig. 3, B and R quinoa samples were clustered together, followed by RO and, finally, W quinoa, which, as shown by the clusters, was the least closely related variety based on the quantified protein groups. The differences and similarities clustering the proteomic profiles of the studied quinoa varieties could be attributed to the genetic features of each variety, as well as to the cultivar agroecological conditions. With regard to the observed variety clusters, the results are complementary to our previous work based on protein fingerprinting by capillary electrophoresis with ultraviolet absorption diode array detection (CE-UV-DAD) and advanced chemometrics (Galindo-Luján et al., 2021), where W and RO quinoa showed a high degree of similarity and were discriminated from B and R quinoa. The differences identified between both approaches may be attributed to the information provided by the applied analytical techniques. While results obtained with CE-UV-DAD were more focused on global differences found at the high-abundance protein level, shotgun proteomics provided detailed information about the identity and the abundance of the proteins from the proteomic profiles getting a deeper biological insight.
In order to have a global overview of the type of identified proteins, they were also classified in four broad categories by Gene Ontology (GO) (using the PANTHER-GO system): cellular component, molecular function, biological process and protein class (see Fig. 4 A-D, respectively). However, the PANTHER-GO system works primarily with UniProt identifiers and the Chenopodium quinoa Uniprot database contains limited amount of data compared to the NCBI database. Therefore, NCBI IDs of the identified proteins were blasted against the Uniprot database of Arabidopsis thaliana, a widely recognized model plant organism. After blasting, uncharacterized proteins from quinoa (85, see Supplementary Table S-1) did not match Arabidopsis thaliana entries from the Uniprot database and they were discarded for GO annotation. Supplementary Table S-2 shows the 1,126 quinoa proteins considered for GO annotation and its correspondence with 1,085 Uniprot IDs from Arabidopsis thaliana (average homology degree was 73%±16% (±s)), which were subjected to PANTHER-GO analysis and classification. In the cellular component category (Fig. 4-A), 489 hits were localized in a cellular anatomical entity (42%), 481 hits in an intracellular part (41%) and 204 hits in a protein-containing complex (17%). In the molecular function category (Fig. 4-B), 331 hits were associated with catalytic activity (46%) and 254 with binding (35%). The rest of hits were associated with structural molecule activity (83 hits), translation regulatory activity (19 hits), transporter activity (19 hits) and molecular function regulation (17 hits) (11%, 3%, 3% and 2%, respectively, Fig. 4-B). In the biological process category, the highest percentage of identified proteins (80%) was involved in cellular and metabolic processes (431 and 349 hits, respectively, Fig. 4-C). The rest of hits were associated with response to stimulus (65 hits), biological regulation (59 hits), localization (51 hits) and other biological processes which represent less than 3%. Finally, in the protein class category, 65% of the identified proteins were classified as metabolite interconversion enzymes and translational proteins (286 and 144 hits, respectively, Fig. 4-D). The rest of hits were classified as protein modifying enzymes (69 hits), chaperones (38 hits), transporters (30 hits), nucleic acid metabolism proteins (21 hits), chromatin/ chromatin-binding or regulatory proteins (16 hits), cytoskeletal proteins (12 hits), protein-binding activity modulators (12 hits), membrane traffic proteins (12 hits), and other classes which represent less than 4%. In order to investigate the characteristic features of the small number of proteins exclusively identified in W, B, R and RO quinoa (156 proteins, see the Venn diagram of Fig. 2), GO graphs were represented also for these proteins (Supplementary Figure S-1). However, no significant differences were observed with regard to the GO graphs of the total proteins (Fig. 4), suggesting that the most differential proteins from the four quinoa varieties were related to similar cellular components, molecular functions, biological processes and protein classes.

Conclusions
We presented a label-free LC-MS/MS shotgun proteomics approach with a state-of-the-art Orbitrap mass spectrometer for the characterization and differentiation of the most typically commercialized quinoa seed varieties (B, R, W and RO quinoa). A total of 1,073, 997, 982 and 964 quinoa proteins from a non-redundant NCBI quinoa database were identified in the four varieties (1,211 identified proteins in total, 85 of them uncharacterized), significantly improving the quinoa proteome coverage from previous studies. In order to investigate relationships between B, R, W and RO quinoa varieties at the proteome level, Venn diagrams, heat maps and GO classification graphs were represented. As indicated above, a similar number of proteins were identified for each quinoa variety (~1,000), and they were similar with regard to GO annotation. However, the characteristic concentration profiles of the identified proteins were useful to find relationships and discriminate between the four varieties. The study provides the most comprehensive experimental quinoa proteomic map existing to date in the literature that can be used for more specific characterization and nutritional studies on the identified proteins. Proteomic profiling of quinoa seeds may be used for quality control of quinoa and quinoa-containing foodstuff, as well as aid in the enhancement of quinoa seed nutritional value or technological properties. Additionally, it may find applicability in the improvement of industrial processing procedures or cultivar yields under different agroecological conditions. Conceptualization, Investigation, Writing -review & editing, Supervision, Funding acquisition.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.