Proteomics and Its Use in Obtaining Superior Soybean Genotypes

Soybean (Glycine max L. Merrill) is one of the most important and most cultivated crops in the world, with significant quantities of proteins being found in their yield composition, around 40% of their yield dry matter. This expressive quantity of proteins, and also a con‐ siderable percentage of oil, around 21% of their dry matter, has turned this grain into a product of great importance for the industrial sector, whether it be for food, cosmetics or, more recently, biofuels. Thus, soybean breeding programs directed toward these areas be‐ come ever more important, together with agronomic characteristics that allow greater pro‐ ductivity in sustainability with the environment in which they are produced.


Introduction
Soybean (Glycine max L. Merrill) is one of the most important and most cultivated crops in the world, with significant quantities of proteins being found in their yield composition, around 40% of their yield dry matter.This expressive quantity of proteins, and also a considerable percentage of oil, around 21% of their dry matter, has turned this grain into a product of great importance for the industrial sector, whether it be for food, cosmetics or, more recently, biofuels.Thus, soybean breeding programs directed toward these areas become ever more important, together with agronomic characteristics that allow greater productivity in sustainability with the environment in which they are produced.
The achievement of soybean genome sequencing [1], facilitated by identification of the genetic base, lead to advances in obtaining improved cultivars through knowledge of the complete sequence of expressed genes.Nevertheless, this information is not sufficient to identify which proteins are really being expressed in the cell at a given moment and under a certain condition since, through the phenomenon of splicing, different proteins may be produced by alteration of the command of a single gene.Thus, the complementary DNA (cDNA) and the messenger RNA (mRNA) have come to be the main focus of study for obtaining information regarding genetic expression or transcriptome.Nevertheless, due to post-translational regulation mechanisms, the quantity of expressed protein is not necessarily proportional to the quantity of its corresponding mRNA, which often raises questions regarding the role of this gene in cellular metabolism.
The reason for this is that control of gene expression occurs from mRNA transcription up to post-translational modifications like glycosylation and phosphorylation, among other processes, which alter protein activity (Figure 1).
In recent years, for the purpose of complementing the information obtained by means of genome sequencing and transcriptome, proteomics, one of the dimensions of the post-genome era [2], arises with a set of highly powerful techniques for separation and identification of proteins in biological samples, allowing better understanding of the networks of cellular operation and regulation upon representing the link between the genotype and the phenotype of an organism.
For the aforementioned reasons, proteomic analysis is now one of the most efficient means for functional study of the genes and genomes of complex organisms [3].This has generated new data, as well as validated, complemented and even corrected information obtained through other approaches, thus contributing to better understanding of plant biology.Its study involves the entire set of proteins expressed by the genome of a cell, or only those that are expressed differentially under specific conditions.Also it is directed to the set of protein isoforms and post-translational modifications, to the interactions among them, as well as to the structural description of molecules and their complexes.
Bidimensional electrophoresis and mass spectrometry are the core technologies of proteomics, although new methodologies are being applied to plants for specific studies [4,5,6].Among the most recent proteomic techniques are Difference Gel Electrophoresis (DIGE) and Multi-dimensional Protein Identification Tecnology (MudPIT), used in separation of proteins from a complex mixture.Other methods involved are Stable Isotopic Labeling using Amino Acids in Cell Culture (SILAC), Isotope Coded Affinity Tag (ICAT) and Isobaric Tag for Relative and Absolute Quantitation (iTRAQ) are based on labeling with isotopes for quantification of molecules by mass spectrometry.
In spite of the recent nature of research in this area, diverse studies with soybeans using proteomic tools are being performed throughout the world, showing this to be a promising area for selection of genotypes for genetic breeding programs [7,8].Moreover, the study of plant responses to infections from pathogens has supplied significant data for understanding the signaling process that triggers the defense response in plants [9].Additionally, there are studies characterizing the proteome of plants in response to different stress conditions arising from both abiotic factors [10] and biotic factors [11].These comparative studies of contrasting genotypes for a determined type of stress allow identification of the proteins that respond to stress by means of changes in their levels of expression.Identifying these molecules and their respective functions, the work of breeding is directed and should have continuity only with those molecules that perform roles related to the characteristic of stress tolerance.For that reason, it is essential to cross the proteomic data with information also obtained by genomics, transcriptomics and metabolomics, so as to verify the correlation of the candidate proteins with the desired characteristic.
In relation to products derived from genetically modified foods, proteomic techniques have been applied to allow a broad approach and the analysis of many variables simultaneously in a single sample.There are also other studies relating the proteome expressed during development of the plants, as well as research in which soybeans have been the target of investigations regarding nutritional, toxicological and allergenic aspects, above all on genetically modified varieties [12].This makes for increased use of this technique in biosecurity studies.In this context, the objective of this chapter is to present the main technologies used in proteomic studies in diverse areas of activity, as well as the main scientific results obtained in the search for superior soybean genotypes.

Technologies used in proteomic studies.
Execution of a proteomic study involves the integration of many technologies which permeate the fields of molecular biology, biochemistry, physiology, statistics and bioinformatics, among other areas.The key steps in this type of study are separation of complex mixtures of proteins and their identification.
Separation is performed through the use of electrophoresis a term created by Michaelis in 1909.The first electrophoresis of proteins (Figure 2) was performed in 1937.Alfenas (1998) [14] explains that electrophoresis aims at separation of molecules in terms of their electrical charges, their molecular weights and their conformations, in porous supports and appropriate buffers, under the influence of a continuous electrical field.Molecules with a preponderance of negative charges migrate in the electrical field to the positive pole (anode), and molecules with excess of positive charges migrate to the negative pole (cathode).The preponderant charge of a proteic molecule is in accordance with its amino acids.
Many of the technologies currently used in proteomics were developed much before the beginning of proteomics, as is the case of electrophoresis.Nevertheless, it was the advance in protein sequencing technology by means of mass spectrometry that allowed its emergence and development [15].
The study of proteomics may be performed by means of techniques like two-dimensional electrophoresis in polyacrylamide gel (2D PAGE) followed by mass spectrometry (MS) (Figure 3), or furthermore, more recently, by the association of ionization and chromatographic methods, among others, which increase detection sensitivity even more.Nevertheless, the point of departure has still been the exposure of a large number of proteins from a cell line or organism in two-dimensional polyacrylamide gels [16,17,18].

Two-dimensional polyacrylamide gel electrophoresis (2D PAGE).
Two-dimensional polyacrylamide gel electrophoresis constitutes an analytical method capable of separating hundreds of proteins in a single analytical run.In this case, the gel, with the sample already applied, is submitted to an electrical field for two-dimensional separation.In the first dimension, separation occurs through isoelectric focalization, in which physical separation of the proteins occurs in terms of their respective isoelectric points on a strip of polyacrylamide with continuous gradation and known pH (IPG -immobilized pH gradient) submitted to increasing voltage.In the second dimension, the proteins under focus are submitted to polyacrylamide gel electrophoresis in the presence of SDS (SDS-PAGE) for separation according to their specific molecular masses (Figure 4).Thus, this is a technique that separates the proteins through different charges and masses.
The result of two-dimensional electrophoresis is a profile of spot distribution formed by single proteins or simple mixtures of proteins [21].Each spot visualized in the gel may be considered as an orthogonal coordinate of a protein that migrated specifically in accordance with its isoelectric point (x axis) and its molecular mass (y axis), as shown in Figure 4.
The next step consists of staining the gel with silver, Coomassie blue, fluorescence, radioactive labeling or specific markers for phosphoproteins and glycoproteins, among others.This allows visualization of the protein expression pattern and photodocumentation of the gel (Figure 5).After that, sectioning and digestion of selected spots of the gel are carried out and, finally, proteins of interest are identified by mass spectrometry integrated with a bioinformatics tool.Two-dimensional electrophoresis gels reflect the protein expression pattern of the biological sample analyzed and allow detection of variation of even a single amino acid between two isoforms or covalent modifications in the same protein thanks to change in the position of the spot.
It is important to highlight that each sample, depending on its nature, requires a specific type of processing for extraction and focalization.Therefore, it is expected that the user checks beforehand in related publications as to the protocols and methodologies that best suit the experimental needs.Some limitations are associated with two-dimensional electrophoresis, such as low reproducibility and little power of automation.Nevertheless, reproducibility may be increased by defining optimal conditions for the electrophoresis, while automation of the process is only possible in relation to analysis of gels.Gel analysis software determines the spots and identifies those expressed differentially and their volumes, inferring a relative quantification of expression of that protein in comparison to the same spot of another gel [22].Thus, by a process of subtraction, the differences among the different samples are revealed, as, for example, the presence, absence or intensity of the proteins.Thus, the proteins of interest may then be identified based on knowledge of the isoelectric point and of apparent molecular weight, determined by the two-dimensional gels [23].

Differential in gel electrophoresis (DIGE).
An efficient procedure in the attempt to eliminate variation from gel to gel is use of the technique of differential in gel electrophoresis or DIGE (Figure 6), which allows analysis of up to three proteomes in a single gel.These results in one internal pattern common to all the gels and two different samples labeled with distinct fluorophores (CyDye) [25].That way, only the proteins labeled with their own fluorophore are visualized.In addition, this technique uses labeling of proteins with a broad dynamic range of detection and has sensitivity greater than staining of the gels by silver methods, allowing proteomic studies of a quantitative nature to be performed with greater precision, accuracy and sensitivity [26].

Liquid chromatography
Another form used for separation of proteins is by means of liquid chromatography.The sample that is, for example, a mixture of peptides generated by proteolytic digestion from a protein extract passes through a first separation, by means of liquid chromatography, where the enriched peptide fractions are collected and applied in the spectrometer.As complete automation is the main target of the methods for large scale analyses, methods of separation were developed free of gel by reverse phase liquid chromatography connected with tandem mass spectrometry (LC/MS/MS).In Figure 7 the operational and equipment sequence involved in a typical analysis via LC/MS/MS is shown.
Greater automation is possible with multidimensional liquid chromatography, which uses different characteristics of the proteins in columns of distinct properties or in a single twophase column [29].The fraction eluted in the first column is directly introduced in the second column, which may be directly connected to the mass spectrometer.This technique, called MudPIT, is inserted in the context of the shotgun proteomic, in which greater resolution of the proteomes is possible, facilitating identification of the less abundant proteins frequently lost when gels are used [30].

Protein identification methods.
After separation of proteins, the next stage consists of their characterization and identification using mass spectrometry, which is a technique where the ratio between the mass and the charge (m/z) of ionized molecules in the gas phase is measured.In general, a mass spectrometer consists of an ionization source, a mass analyzer, a detector and a data acquisition system.
The great variety of spectrometers found on the market is the result of different combinations of types of sources of ionization and mass analyzers, which provide certain levels of sensitivity and accuracy in the results.At the ionization source, the molecules are ionized and transferred to the gas phase.In the mass analyzer, the ions formed are separated in accordance with their m/z ratios and later detected, usually by electron multiplier [31].
With the development of ever more specialized equipment for proteins, mass spectrometry has become a revolutionary tool in modern protein chemistry.This technology has allowed identification of proteins by a methodology called peptide mass fingerprinting.Rocha et al. (2003) [3], state that this methodology is based on protein digestion to be identified by a proteolytic enzyme, for example trypsin, producing fragments called peptides.The masses of these peptides obtained form a kind of fingerprinting of the protein, which are then determined with great acuity (0.1 to 0.5 Da) by mass spectrometry.Special software allows comparing the peptide mass fingerprinting of the protein one wishes to identify with those theoretically generated for all the protein sequences present in the databases.If the protein sequence problem is in the database, it will immediately be identified [32].

Relative protein quantification
Large scale protein quantification methods make an estimate of relative expression possible by means of labeling with radioactive isotopes, fluorescents and light/heavy, allowing the same protein to be quantified in a relative way among differently labeled samples.Some of the most used radioactive isotopes are the iCAT (Isotopic coded affinity tag), iTRAQ (isobaric tags) and H 2 O 18 .
The iCAT consists of addition of a label that has affinity for cysteine residues and which has a bonded molecule of eight atoms of hydrogen or eight atoms of deuterium.One sample is labeled with the tag containing hydrogen and the other sample with the tag containing deuterium.After digestion of the proteins, the resulting peptides are identified by mass spectrometry.Equal peptides labeled in the two samples are identified by overlap of the peaks that show distinct m/z due to the type of bonded isotope, with the ratio between the area of the two peaks being a relative measure of the expression of that protein.According to Yi & Goodlett (2003) [33], the main problems associated with this technique are the need for the presence of cysteine residues, the high cost of the reagents and the greater time necessary for sequencing.
In the iTRAQ technique, labeling of proteins with tags and identification by mass spectrometry is also used.The tags bond to all the free amino groups at the N terminal of all the peptides and on the internal side chains with lysine residues and vary according to the reporter group they carry, and they may have 114, 115, 116 or 117Da, thus allowing for the quantification of proteins in up to four types of samples at the same time.The relative quantification is carried out in the same way as in the iCAT, but high cost has restricted its use [34].
The aforementioned techniques require the consumption of specific and expensive reagents.Nevertheless, the same goal may be achieved with a simpler labeling method in which the proteins are labeled with one or two atoms of O 2 .These are incorporated in the carboxyl terminal by simply supplying a solution with H 2 O for one sample and a solution with H 2 O 18  for the other sample.Thus, the relative abundance of the peptides that will differ by 2Da is estimated [35].
Another quantification technique is Stable isotope labeling by amino acids in cell culture, (SILAC) which, together with mass spectrometry and bioinformatics resources, has proven to be quite adequate in proteomic studies.It is a technique that detects differences in the abundance of proteins among cell cultures by means of isotopic labeling of proteins.Labeling with stable isotopes is obtained by supplying isotopically enriched amino acids to a cell culture and natural amino acids to the culture to be compared (Figure 8).

Analysis of post-translational modifications (PTM's).
Another area of great interest in plant proteomics is in regard to characterization of posttranslational modifications or PTM's, essential for proteins to play their roles in the varied cell events, producing different proteins from the same gene.
These modifications occur at specific sites in the proteins [37] changing their physical, chemical and biological properties [38].They may occur by means of cleavages or by the addition of a chemical group to one or more amino acids [39].The main goals of PTM studies in proteomics are identifying the proteins that have them, mapping the sites where these modifications occur, quantifying their occurrence at the different sites and characterizing cooperative PTM's [40].The fact that covalent modifications result in changes in the protein molecular masses makes it possible for these modifications and the amino acids that carry them to be identified by mass spectrometry, allowing more than 300 different types of PTM's to be identified until now with the aid of this technique.Nevertheless, according to Mann and Jensen (2003) [41], mass spectrometry has reduced power of resolution of PTM's because they occur at low stoichiometric levels.This problem may be resolved by adopting fractioning methods prior to sequencing that allow enrichment of the sample for the proteins that have a certain type of PTM.Large scale modified protein enrichment systems are generally carried out by means of affinity chromatography.
One example is the IMAC system -a column of immobilization through affinity to a metal for isolation of phosphorylated proteins in which metal ions of Fe(III) are joined to the ma-trix to promote the isolation of proteins that have phosphorylate residues since the Fe(III) ion is capable of interacting in a reversible manner with the phosphate group of the modified peptide keeping it attached to the column [41].
Contrary to that which occurs with the reversible yet permanent PTM's, like glycosylation, low stoichiometry does not occur, but the addition of carbohydrates hinders the proteolytic digestion necessary for identification by mass spectrometry [21].In addition, when the modified peptide is fragmented for sequencing, it loses sugar residues, impeding the identification of the modified amino acids.To resolve this problem, digestion of the proteins is performed so as to remove the sugar residues and produce a modification in the modified site that makes it identifiable [42].
Electrophoresis gels may also be used in enrichment of samples for PTM's as performed for detection of phosphorylations and glycosylations with commercially available kits.The modified proteins, specifically labeled in the gel, are visualized and excised for identification by mass spectrometry.One important aspect of the use of gels for identification of PTM is the possibility of visualizing the spots differentially expressed among samples that have the PTM.

Food safety
In the case of food, proteins are especially important for evaluation of food safety because they may place consumer health at risk.That is because proteins may be involved in synthesis of toxins and antinutrients, as well as being a toxin, an antinutrient or even an allergenic [43].
Soybeans are an important source of food throughout the world, being consumed in daily meals of all types.It has also been widely used as a food substitute by people that have intolerance to lactose or other milk proteins [44].Nevertheless, in this species are also found proteins considered allergenic.Thus, knowledge regarding the proteins with toxic/antinutritional potential present in this grain becomes fundamental for development of biotechnological strategies that would have the target of elimination or inactivation in the genome of these species of genes that codify for these proteins.
Therefore, application of proteomic analysis in this type of study has been widely discussed.
In relation to products derived from genetically modified (GM) foods, proteomic techniques have been applied because they allow a wide-ranging approach and analysis of many variables simultaneously in the same sample [45].Ocana et al. (2007) [46], studying GM proteins present in soybean and maize samples using proteomic analysis, identified the protein CP4 EPSPS, which confers tolerance to glyphosate herbicide.These samples were submitted to specific separation techniques followed by two-dimensional electrophoresis and mass spectrometry for detection and characterization of the proteome.Related to allergies, various allergens belonging to the superfamily of cupins and prolamins have been identified in soybeans [47].Research has suggested that a heterogeneous group of soybean proteins bond to the IgE antibody and are potential allergens as, for example, Gly m Bd 30k, β-conglycinin, Gly m Bd 28k, glycinin, Kunitz type protease inhibitor, some proteins present in the hull (Gly m 1.0101, Gly m 1.0102 e Gly m 2), profilin (Gly m 3), SAM 22 (Gly m 4), and other allergens like lectin and lipoxygenase [47,48].According to Wilson et al. (2005) [49], in spite of the allergens identified in soybeans, the challenge of food researchers is developing a process for eradicating the immunodominant allergens, maintaining the functionality, nutritional value and effectiveness in the subsequent products derived from soybeans.For that reason, research has been developed using genetic engineering for silencing the soybean gene responsible for synthesis of the protein Gly m Bd 30K, one of the main soybean proteins that develop allergic reactions with serums of sensitive patients [44].

Biotic and abiotic factors
In a similar manner, various studies have shown that the proteomic approach is highly useful for investigation of crop response to environmental stresses because it compares the way the proteome is affected by different physiological conditions.Saline stress is one of the many types of abiotic stresses that affect plants and compromise their yield.Salinity is a common agricultural problem in arid and semiarid regions and creates large unproductive areas.There has been an ever greater search for cultivars adaptable to this condition.Sobhanian et al. (2010) [10], used proteomic techniques to evaluate the metabolism of proteins in leaves, hypocotyls and roots submitted to different NaCl concentrations (Figure 9), thus leading to saline stress.
Results in soybeans suggest that, in adaptation to saline conditions, proteins perform different roles in each organ, and the proteins most affected by saline stress are those related to photosynthesis.Therefore, there is less energy production, and, consequently, reduction in plant growth.The conclusion suggests that the gene Glyceraldehyde-3-phosphate dehydrogenase may be, in the future, one of the target genes to improve tolerance to saline stress in this species.
Another type of abiotic stress studied in soybeans in which a proteomic approach is used is flooding stress [50,51].Growing this species in areas subject to flooding makes the root environment anoxic, affecting nodulation or root growth.That way, plants respond with greater or less efficiency, allowing the distinction between cultivars which are tolerant and intolerant to this stress.
Proteomic analyses of soybean seedlings in response to flooding were undertaken by Shi et al. (2008) [52] to identify the key proteins involved in this process.To identify the first proteins produced in response to flooding, the roots of the seedlings were used for extraction of the proteins.The two-dimensional gel results suggest that cytosolic ascorbate peroxidase 2 (cAPX 2) is involved in response to flooding stress in young soybean seedlings.
In the case of drought stress, up-regulation of reactive oxygen species (ROS) scavengers such as superoxide dismutase (SOD) was reported in soybean seedlings [53].The proteome analysis of two-day-old soybean seedlings subjected to drought stress by withholding of water for two days revealed a variety of responsive proteins involved in metabolism, disease/defense and energy including protease inhibitors [53].The major reason for loss of crop yields under drought stress is a decrease in carbon gain through photosynthesis.Proteome analysis of soybean root under drought condition showed that two key enzymes involved in carbohydrate metabolism, UDP-glucose pyrophosphorylase and 2,3-bisphosphoglycerate independent phosphoglycerate mutase, were down-regulated upon exposure to drought [54].The identification of proteins such as UDP-glucose pyrophosphorylase and 2,3-bisphosphoglycerate has provided new insights that may lead to a better understanding of the molecular basis of responses to drought stress in soybean Stress by toxicity caused by the presence of high quantities of aluminum in the soil has also been investigated in soybeans from the perspective of proteomics [55,56].Duressa et al. (2011) [56], studying cultivars tolerant and susceptible to high doses of aluminum, made proteomic analyses of roots, arriving at the conclusion that the greatest expression of enzymes involved with citrate synthesis would be a good strategy in the search for cultivars tolerant to this mineral (Figure 10).
Another focus of the study within the context of selection of superior soybean genotypes using the proteomic approach is exposure to ultraviolet radiation, which has gained importance with the prominent worldwide concern for global warming and the consequent degradation of the ozone layer.Xu et al. (2007) [57], studied the proteome of soybean leaves to investigate the protective role of flavonoids against the incidence of UV-B radiation.The authors suggest that high levels of flavonoid reduce the sensitivity of the plant to this radiation.
In relation to biotic stresses caused by pathogens like fungi, bacteria, nematodes and viruses, proteomic tools are also greatly used because they allow understanding of the plantpathogen relationship [11,58,59,60] and also how the nodulation process occurs by means of symbiosis between the soybean roots and rhizobia [61].In these cases, proteomic analysis provides the information that will be used by genetic breeding in the search for cultivars resistant to various diseases.In addition, proteomic studies that deal with seed development also play an essential role [62].The data obtained may help to interpret the function of genes that determine protein concentration, considered as a key characteristic for genetic breeding of soybeans.Moreover, differential proteomic analyses designed to describe the changes that occur from maturation to senescence in organs and organelles have been reported.There is also already a soybean proteome database, providing information on the proteins involved in the soybean response to stress caused by drought, salinity and, principally, flooding [63].

Final considerations
In light of the above, proteomics in soybean studies contributes to diverse biotechnological applications, with its approach proving to be fundamental.Its use in the search for superior soybean materials has the purpose of comparing and contrasting genotypes for a determined type of stress and identifying the proteins that respond to the stress by means of changes in their levels of expression.The identification of these molecules and their respective functions will allow direction of breeding work, which should continue only with those that perform roles related to the characteristic of stress tolerance.
For that reason, it is essential to cross proteomic data with information also gathered from genomics, transcriptomics and metabolomics so as to check the correlation of the candidate proteins with the desired characteristic.The following stage aims to evaluate these proteins (genes) in regard to their segregation for the characteristic of interest or quantitative trait locus (QTL), that is, determine how much each one of them contributes to the characteristic of tolerance.Finally, the selected genes may be integrated in marker assisted selection (MAS) or in genetic transformation programs.

Figure 1 .
Figure 1.Pathways in which gene and protein expression may be regulated or modified in transcription or in posttranslation [13].
Zhang et al. (2011) [58] evaluated the responses of cultivars tolerant and susceptible to the fungus Phytophthora sojae by means of two-dimensional electrophoresis.The authors observed 46 proteins being expressed (Figure11), among which only 11% were related to plant defense.

Figure 11 .
Figure 11.Identification of 26 and 20 protein spots from Yudou25 (A) and NG6255 (B), respectively.The numbers with arrows indicate the differentially expressed protein spots.Ip and Mr are shown on the gels [58].