Glycinin A3B4 mRNA CLONING AND SEQUENCING OF DOUBLE-STRANDED cDNA COMPLEMENTARY TO A SOYBEAN STORAGE PROTEIN*

The cDNA clones encoding the precursor form of glycinin AsB4 subunit have been identified from a li- brary of soybean cotyledonary cDNA clones in the plasmid pBR322 by a combination of differential colony hybridizations, and then by immunoprecipitation of hybrid-selected translation product with As-mono-specific antiserum. A recombinant plasmid, designated pGA3B41425, from one of six clones covering codons for the NHz-terminal region of the subunit was se- quenced, and the amino acid sequence was inferred from the nucleotide sequence, which showed that the mRNA codes for a precursor protein of 516 amino acids. Analysis of this cDNA also showed that it contained 1786 nucleotides of mRNA sequence with a 5‘- terminal nontranslated region of 46 nucleotides, a signal peptide region corresponding to 24 amino acids, an As acidic subunit region corresponding to 320 amino acids followed by a B4 basic subunit region correspond- ing to 172 amino acids, and a 3’-terminal nontranslated region of 192 nucleotides, which contained two characteristic AAUAAA sequences that ended 110 nucleotides and 26 nucleotides from a 3’-terminal poly(A) segment, respectively. Our results confirm that glycinin is synthesized as precursor polypeptides which undergo post-translational processing to form the nonrandom polypeptide pairs via disulfide bonds. The inferred amino acid sequence of the mature basic subunit, B4, was compared to that of the basic subunit of pea legumin, Leg /3, which contained 185 amino acids. Using an alignment that permitted a maximum homology of amino acids, it was found that overall 42% a Soybean Cotyledonary ds-cDNA Recombinant Plasmid Library-Double-stranded poly(A)-containing Thomas 10 pg of were denatured with glyoxal and resolved electrophoresis in 1% agarose gels in sodium pH h at V/cm. The transferred overnight nitrocellulose by blotting, nitrocellulose glyoxylation blots citrate 32P-labeled 162-bp endonuclease DNA fragment sequence glycinin A3 Nucleotide Sequencing-Nucleotide sequencing performed ac- cording the procedures of Maxam and Gilbert Restriction endonuclease-digested DNA fragments end-labeled at their 5’- termini with [T-~~P]ATP (Amersham), after treatment with calf intestine alkaline phosphatase (Boehringer Mannheim), and end-labeled at their 3”termini with [~u-~*P]dideoxy-ATP (Amersham) and terminal transferase (P-L Biochemicals). Nucleotide sequences examined computer

The cDNA clones encoding the precursor form of glycinin AsB4 subunit have been identified from a library of soybean cotyledonary cDNA clones in the plasmid pBR322 by a combination of differential colony hybridizations, and then by immunoprecipitation of hybrid-selected translation product with As-monospecific antiserum. A recombinant plasmid, designated pGA3B41425, from one of six clones covering codons for the NHz-terminal region of the subunit was sequenced, and the amino acid sequence was inferred from the nucleotide sequence, which showed that the mRNA codes for a precursor protein of 516 amino acids. Analysis of this cDNA also showed that it contained 1786 nucleotides of mRNA sequence with a 5'terminal nontranslated region of 46 nucleotides, a signal peptide region corresponding to 24 amino acids, an As acidic subunit region corresponding to 320 amino acids followed by a B4 basic subunit region corresponding to 172 amino acids, and a 3'-terminal nontranslated region of 192 nucleotides, which contained two characteristic AAUAAA sequences that ended 110 nucleotides and 26 nucleotides from a 3'-terminal poly(A) segment, respectively. Our results confirm that glycinin is synthesized as precursor polypeptides which undergo post-translational processing to form the nonrandom polypeptide pairs via disulfide bonds.
The inferred amino acid sequence of the mature basic subunit, B4, was compared to that of the basic subunit of pea legumin, Leg / 3 , which contained 185 amino acids. Using an alignment that permitted a maximum homology of amino acids, it was found that overall 42% of the amino acid positions are identical in both proteins. These results led us to conclude that both storage proteins have a common ancestor.
Glycinin, one of the predominant globulin components stored in protein bodies of soybean (Glycine m a (L.) Merr.) seeds, is composed of six subunits including unique acidic (A) and basic (B) polypeptides linked together via a disulfide bond to form A-B intermediary subunit complexes (1,2). There are several specific A-B subunit pairs, since at least six different acidic ( M I -35,000-45,000) and five basic compo-* The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "aduertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
$ To whom correspondence should be addressed. nents ( M , -22,000) were identified unambiguously on the basis of differences in the partial primary structure of their NH2-terminal regions (3). These acidic subunits were designated Ala, Alb, AP, AS, pL4, and As, while the basic ones were designated Bla, Bib, B2, B3, and B4 (2). Recent work (4,5) has suggested that the acidic and basic subunits of a glycinin molecule are synthesized from the messenger RNA encoding the A-B subunit precursor, which may be subsequently cleaved to form the A and the B subunits in a post-translational processing step. The A3B4 subunit has been studied further. The A3 component of the A3B4 subunit pair has the largest molecular weight (M, -45,000) on SDS' gels among the glycinin subunits (29) and strong relatedness to AS and A4 acidic subunits. We isolated several clones encoding the A3 subunit family such as the A3B4 and A5&B3 subunits of glycinin and demonstrate here the complete nucleotide sequence of the cloned DNA which contained the entire translated region of A3B4 subunit mRNA. By comparing the sequence of the A3B4 subunit cDNA with the NH2-terminal amino acid sequence of A3 acidic (3, 6 ) and B4 basic (3) subunits, we deduce that the glycinin A3B4 subunit precursor consists of a signal peptide corresponding to 24 amino acids and the A3 acidic subunit followed by the B4 basic subunit.

EXPERIMENTAL PROCEDURES
Harvesting and Storage of Cotyledons-Glycine max var. Bonminori, which is an early ripening variety and takes about 60 days after flowering to mature its seeds, was used for this experiment. Seeds of this variety were grown in our agronomy field and pods were harvested at a middle stage of the development (38 days after flowering) as described previously (7). The seeds were aseptically removed from the pods and the cotyledonary tissues separated from the testa and the embryo were frozen in liquid Nz to retain polysome integrity and used immediately or kept at -80 "C until needed.
Preparation of RNA-Total RNA was extracted from frozen cotyledonary tissue by a modification of the method of Brawerman et al. (8). Briefly, soybean cotyledonary tissues crashed in liquid Nz (about 40 g) were mixed with 4 volumes of 85% phenol. An equal volume of extraction buffer (50 mM Tris-HCI, pH 9.0, and 1% SDS) was added, and the mixture was homogenized with a Physcotron (Nichion Rikka Co. Japan, NS-500 T) at speed 80 for 3-4 min. The homogenate was stirred for 30 min at room temperature, and then the phases were separated by centrifugation. The aqueous phase was extracted 3 times with an equal volume of 85% phenol. The organic phases were combined and reextracted with fresh buffer. RNA was precipitated by bringing the combined aqueous phases to 0. LiCl (pH 6.5) at 4 "C for 12 h. The resulting precipitate was dissolved in water, and 2 volumes of chilled ethanol were added. The resultant precipitate was rinsed 3 times with 80% chilled ethanol and dried in umw. This RNA pellet was dissolved in sterile water for further experiments.
Poly(A)-containing RNA was prepared by oligo(dT)-cellulose (Collaborative Research, type 111) affinity chromatography (9). Poly(A)containing RNA was fractionated by sedimentation through 5-29.9% isokinetic sucrose gradients and individual fractions (0.7 ml each) were analyzed by in oitro translation system as described below.
Cell-free Translation and Immunoprecipitation-Poly(A)-containing RNA was translated in a protein-synthesizing system derived from micrococcal nuclease-treated rabbit reticulocyte lysates (10) with minor modifications as described previously (11). Translation products were immunoprecipitated by mixing 20 pl of ribonuclease A-treated lysate reaction mixture with 100 pg of monospecific antibody in the presence of 1% sodium deoxycholate, 0.1% SDS, 150 mM NaCl, 50 mM LiCI, 10 mM EDTA, 1 mM phenylmethylsulfonyl fluoride, and 50 mM Tris-HC1, pH 7.2. The mixtures were incubated at 37 "C for 1 h, then at 4 "C for 12 h. Formaldehyde-fixed Staphylococcus aureus (IgG-SORB, The Enzyme Center, Boston, MA) was added in an amount sufficient to bind about 210 pg of IgG and mixed for 1 h at room temperature. The S. aurew was collected by centrifugation, and the pellets were washed 6 times with 800 pl of the above Tris buffer by repeated resuspension (vortexing and sonication) and by centrifugation. Immunoprecipitates were dissolved by resuspending the pellets in 50 mM Tris-HC1, pH 7.2, containing 1% SDS and 4 mM dithiothreitol, and heating them at 100 "C for 3 min. Translation products and the immunoprecipitates were electrophoresed on 12.5% polyacrylamide gels containing 0.1% SDS, according to Laemmli (12). The resultant gels were impregnated with commercial fluor (EN-HANCE, New England Nuclear), according to the manufacturer's directions, and fluorographed at -75 "C.
Purification of Glycinin Subunits-Glycinin A3 and A, subunits were purified from defatted soybean flour according to Kitamura et al. (1).

Preparation and Characterization of Monospecific Antibodies to
Glycinin AS Subunit-Antibodies against A3 subunit were elicited from random-bred albino rabbits, weighing 2.5 to 3.1 kg, according to the immunization schedule described previously (7). The antiserum obtained was absorbed with other glycinin acidic subunits such as AI., Alb, AP, A4, and Aa, and the resultant antiserum was judged to be monospecific by protein-blotting experiments using a commercial kit of an enzyme immunoassay (The Bio-Rad Immun-Blot Assay Kit), according to the manufacturer's directions. All the subunits of glycinin (about 600 ng of protein) were transferred and bound to the nitrocellulose membrane electrophoretically, following separation of glycinin subunits in a 12.5% polyacrylamide gels containing 0.1% SDS, according to Laemmli (12). The membrane with bound subunits was immersed into a blocking solution which consisted of 20 mM Tris-HC1, pH 7.5, 0.5 M NaC1, and 3% gelatin for 30 min at room temperature. After the solution was removed, the membrane was incubated with first antibodies which were specific for AS subunit (1:120 dilution), washed to remove unbound antibodies, incubated with second antibodies which were conjugated with horseradish peroxidase by the above manufacturer, and immersed into the development solution.
Construction of a Soybean Cotyledonary ds-cDNA Recombinant Plasmid Library-Double-stranded cDNA was synthesized from poly(A)-containing RNA derived from the soybean cotyledonary tissue at the middle stage (38 days after flowering) of seed development essentially as described by Land et al. (13) with minor modifications. Briefly, 10 pg of poly(A)-containing RNA was used as a template for the synthesis of single-stranded cDNA with an oligo(dT) primer and avian myeloblastosis virus reverse transcriptase (Midwest Bioproducts Inc.). The 3"terminus of the single-stranded cDNA was tailed by terminal transferase (P-L Biochemicals) with oligo(dC), then annealed to oligo(dG)12.18 to prime second strand cDNA synthesis by reverse transcriptase, followed by 40 units of Klenow fragment of DNA polymerase 1 (P-L Biochemicals). The ds-cDNA was electrophoresed on a 6.5% polyacrylamide gel in 100 mM Tris, 100 mM boric acid, and 2.5 mM EDTA at pH 8.3, and material larger than 600 base pairs was extracted. Resultant ds-cDNA was tailed with oligo(dC) and inserted into the PstI site of oligo(dG)-tailed pBR322, followed by transfection of Escherichia coli RRl as described by Dagert and Ehrlich (14). Transformants were selected by growth on tetracycline. Recombinant work was performed under containment conditions in accordance with the Genetic Manipulation Advisory Group (Japan) guidelines.
Identification of Transformants Containing the 5'-Coding Region of the Nucleotide Sequence Corresponding to the Glycinin A3 Subunit Family-About 2200 tetracycline-resistant, ampicillin-sensitive transformants were selected and screened by colony DNA-filter hybridization to 32P-labeled cDNAs prepared from RNA fractions that were enriched or deficient in glycinin A3-type subunit mRNAs and to 32P-laheled oligonucleotide probe constructed to correspond to all possible codons for the NH2-terminal amino acid residues 5-9 (Lys-Phe-Asn-Glu-Cys), which is a unique sequence of the A3 subunit family (3). Fig. 2 shows the sequence of mRNA from 5' to 3' specifying glycinin & subunit amino acids 5-9, as well as the cDNA sequence from 3' to 5'. This cDNA sequence was synthesized by a solid phase phosphate triester method using the reaction conditions and procedures of Beaucage and Caruthers (15). After completion of the synthesis, the oligonucleotides were labeled at the 5'-OH end with T4 polynucleotide kinase (Toyobo Biochemicals) and [y3'P]ATP (3000 Ci/mmol, Amersham) to a specific activity of greater than lo7 cpm/ pg and then purified by a 20% polyacrylamide gel electrophoresis in 100 mM Tris, 100 mM boric acid, 2.5 mM EDTA, and 7 M urea (pH 8.3). Transformed bacterial colonies were allowed to grow on nitrocellulose membrane filters of tetracycline (15 pg/ml) plates for 12 h at 37 "C, after which the bacteria-grown filters were transferred to chloramphenicol (170 pg/ml) plates for plasmid DNA amplification. The resultant colonies were lysed on the filters, neutralized, baked, and prepared for hybridization with 32P-labeled single-stranded cDNAs synthesized from glycinin subunit-enriched and -deficient poly(A)-containing RNA fractions, respectively, according to Grunstein and Hogness (16). Colonies that showed a strong hybridization signal with glycinin subunit-enriched cDNA fraction and a weak hybridization signal with glycinin subunit-deficient cDNA fraction were selected for the further colony hybridization with the synthetic oligonucleotide probe. After prehybridization for 4 h at 55 "C in 6 X ssc (1 X ssc is 0.15 M NaCl, 0.015 M sodium citrate, pH 7.0), 5 X Denhardt's solution (1 X is 0.02% bovine serum albumin, 0.02% Ficoll 400, 0.02% polyvinylpyrrolidone), 0.5% SDS, 10% dextran sulfate, and sonicated heat-denatured herring DNA at 100 pg/ml. Hybridization was carried out for 16 h at 32 "C in 6 X SSC, 5 X Denhardt's solution, 0.5% SDS, and 10% dextran sulfate containing the 5'-endlabeled synthetic oligonucleotide probe. After hybridization, the filters were washed in succession for 3 min at 4 "C (5 times), 15 min at 25 "C (2 times), 5 min at 37 'C (1 time), and 2 min at 41 "C (1 time) in 6 X SSC, 0.1% pyrophosphate, and positive colonies were detected by overnight autoradiography at -75 "C with an intensifying screen.
Plasmid DNA Preparations and Restriction Analysis-Bacterial clones were grown in Luria broth supplemented with 15 pg/ml of tetracycline. Plasmid DNA was isolated using the alkaline lysis method (17) and further purified on CsCl gradients. Restriction enzymes were purchased from Nippon Gene Inc. and New England Biolabs, and incubation conditions were as recommended by the manufacturer.
Positive Hybridization Selection of mRNA-A recombinant plasmid, pGA3Bl 1425, containing a ds-cDNA insert of about 1800 bp, was selected as one of six possible candidates for containing the NH,terminal region of the glycinin A3 subunit sequence, as the result of colony hybridization as described above. In order to establish the identity of the putative A3B4 subunit plasmid, 3 pg of the plasmid pGA3B4 were adjusted to 0.75 N NHdOH and 2 M NaCl in a volume of 20 pl, heated at 98 "C for 3 min, and immediately spotted on a 1-cm2 nitrocellulose filter (Schleicher & Schuell). The filter was airdried and baked at 80 "C under vacuum. The filter was cut into small pieces and incubated in 65% formamide, 0.4 M NaC1, 3 mM EDTA, and 10 mM Pipes (pH 6.4) at 50 "C for 5 min in five changes of the above buffer. Then, poly(A)-containing RNA (25 pg) in 100 pl of the above buffer was hybridized to the filters by incubation for 2 h at 52 "C and for 4 h with a decrease in temperature from 52 to 45 "C. Filters were washed 10 times for 10 min each, at 65 "C in 75 mM NaCl, 7.5 mM sodium citrate, 0.5% SDS, pH 7.4. The filters were further washed 3 times in 2.0 mM EDTA, pH 7.9, for 3 min each at 60 "C. Hybridized RNA was eluted by two successive 2-min boilings with 150 pl of deionized water, and the eluted RNA was quickly frozen in dry ice. Then, the RNA was precipitated with ethanol after the addition of 20 pg of E. coli tRNA. This positive hybridization selected-RNA was translated in the cell-free, protein-synthesizing system, and the reaction mixture was immunoprecipitated with antibodies specific for the A3 acidic subunit of glycinin.
RNA Blot Analysis-To estimate the size of the mRNAs encoding the glycinin A3 subunit family, poly(A)-containing RNA was analyzed by RNA blot analysis essentially according to Thomas (18). Briefly, 10 pg of RNA were denatured with glyoxal and resolved by electrophoresis in 1% agarose gels in 10 mM sodium phosphate, pH 6.8, for 4 h at 5 V/cm. The RNA was transferred overnight to nitrocellulose by blotting, and the nitrocellulose was baked for 2 h at 80 "C under vacuum. Complete reversal of glyoxylation was achieved by soaking the baked blots in 0.3 M sodium citrate containing 3 M NaCl for 12 h. Blots were hybridized to 32P-labeled 162-bp PstI-MluI restriction endonuclease DNA fragment which contains a unique sequence for the glycinin A3 subunit family.
Nucleotide Sequencing-Nucleotide sequencing was performed according to the procedures of Maxam and Gilbert (19). Restriction endonuclease-digested DNA fragments were end-labeled at their 5'termini with [T-~~P]ATP (Amersham), after treatment with calf intestine alkaline phosphatase (Boehringer Mannheim), and were end-labeled at their 3"termini with [~u-~*P]dideoxy-ATP (Amersham) and terminal transferase (P-L Biochemicals). Nucleotide sequences were examined by computer analysis (Software Developing Company, Japan).

RESULTS AND DISCUSSION
As there is a strong relatedness among the glycinin subunits in amino acid sequences (2, 3) and in an immunological analysis (20), it has been difficult to identify plasmids containing the cDNA encoding the precursor form of a specific glycinin subunit using immunological techniques such as hybrid-selection and hybrid-arrest translations. In order to identify the cDNA clones covering codons for the NHz-terminal region of the glycinin A3 subunit family (A3B4 and A5A4B3 subunits) from a library of soybean cotyledonary cDNA clones in the plasmid pBR322, we employed the following three procedures. 1) The A3 subunit monospecific antibodies which could be used for a positive hybridization selection of mRNA were prepared by an immunological adsorption technique.' The antiserum elicited by hyperimmunization of the AB subunit was treated with purified AI., Albr At, Aq, and As subunits of glycinin. Fig. 1 demonstrates that the antibodies absorbed are monospecific for the A, subunit of glycinin in an enzymeimmunoblot assay system, whereas there is a strong cross-

B C
FIG. 1. Demonstration of glycinin A3 subunit monospecificity of the antiserum, prepared by adsorption technique, using an immunoblot assay system. A crude glycinin preparation treated with 0.1% SDS and dithiothreitol was electrophoresed on a 12.5% acrylamide gel containing 0.1% SDS, followed by Amido Black staining (A) or by protein blotting experiments using rabbit anti-As subunit antibodies previously adsorbed with Ala, Alb, A2, and A5 subunits (B), and with Ale, Alb, Az, Ah, and A5 subunits (C).
C. Fukazawa and K. Udaka, manuscript in preparation. reactivity between A5 + A, and A3 subunits in an immunological sense. Using this antiserum several recombinant plasmids selected as possible candidates for containing the A3 subunit sequence were used for the positive hybrid selection. 2) Total poly(A)-containing RNA isolated from soybean developing cotyledonary tissue at 38 days after flowering was fractionated by sedimentation through 5-29.9% isokinetic sucrose gradients, and the individual fractions of the sucrose gradient centrifugation were translated in the presence of [3H]leucine in a mRNA-dependent, protein-synthesizing system derived from rabbit reticulocyte lysates. The total translation products were immunoprecipitated with antibodies prepared against the AB subunit of glycinin as described above and examined by gel electrophoresis and fluorography. Glycinin ASB4 subunit precursor was detected as M, = 61,000 (data not shown). This value is in agreement with the previously reported value for the molecular weight of one of total immunoprecipitated products with anti-glycinin IgG (4). This result also showed that A3B4 subunit mRNA was enriched in three gradient fractions numbered 6-8. To screen the cDNA library by colony DNA-filter hybridization RNA fractions that were enriched (fraction numbers 6 and 8) and that were deficient in glycinin A3B4 mRNA (fraction 10) were employed, respectively. 3) To select the plasmids covering the nucleotide sequence corresponding to the NHz-terminal region of the As subunit family a mixed oligonucleotide probe was constructed to correspond to a unique sequence of the subunit as shown    in Fig. 2. Using this 32P-labeled oligonucleotide probe plasmids from colonies that hybridized intensely with fraction 6 and 8 cDNAs, but not with fraction 10 cDNA were further characterized and the lengths of recombinant ds-cDNA were determined by restriction endonuclease digestion and electrophoresed. A recombinant plasmid, pGA3B4 1425, containing a ds-cDNA insert of about 1800 bp, was selected as one of the longest possible candidates for containing NHz-terminal unique sequence of the AB subunit family by a combination of differential colony hybridizations described above.
To establish the identity of the putative glycinin A3B4 subunit plasmid, pGA3B4 1425, DNA was immobilized on nitrocellulose and hybridized to total cotyledonary poly(A)containing RNA of the developing soybean seed. The specif- The boxes enclose amino acids that are identical in both glycinin and legumin. The single letter notation for amino acids is in accordance with the recommended convention of Dayhoff (28). The solid circles at several positions indicate the deletion spaces that were placed in each sequence to maximize the homology between the two proteins. The dashes represent the amino acid residues which have not been determined. The arrow indicates the cleavage sites of the subunit complexes of the two proteins.
ically bound RNA was eluted and translated in the cell-free, protein-synthesizing system. As indicated in Fig. 3, lane I , the pGA3B4 1425 DNA specifically hybridized to a mRNA that directed the synthesis of the polypeptide of M, = 61,000.
This translation product was immunoprecipitated with monospecific antibodies to the A3 subunit of glycinin (Fig. 3, lane Z), thus establishing the identity of the recombinant plasmid. The size of glycinin A3B4 subunit mRNA was estimated by hybridizing 32P-labeled 162-bp PstI-MluI restriction endonuclease DNA fragment which contains a unique sequence of the A3 subunit family to nitrocellulose filter blots of poly(A)containing RNA that had been denatured and partially resolved by agarose gel electrophoresis. As shown in Fig. 4, the restriction endonuclease DNA fragment probe hybridized to a RNA species of about 2000 nucleotides in length.
The complete nucleotide sequence of the cloned glycinin A3B4 subunit ds-cDNA was determined. The restriction endonuclease cleavage fragments and the sequencing strategy used in the nucleotide sequence determination are indicated in Fig. 5. The various restriction fragments overlap in several regions, generating essentially complete sequence information for each of the two strands of the ds-cDNA. Sites of labeling were independently sequenced within other restriction endonuclease fragments.
The complete nucleotide sequence for the coding strand of cloned glycinin A3B4 subunit ds-cDNA is shown in Fig. 6. The protein synthesis termination codon for this mRNA is UAA, which is followed by 192 untranslated nucleotides in the 3'region adjacent to a poly(A) segment. The sequence AAUAAA, characteristically found near the terminal region of the 3"untranslated segment of most eukaryotic mRNAs (22), is located in the glycinin A3B4 subunit cDNA, ending 26 nucleotides upstream from the poly(A) segment. This sequence is also located in this DNA, ending 110 nucleotides upstream from the segment.
The position of the NHz-terminal isoleucine residue of mature A3 subunit (with its codon beginning at position 73 in the nucleotide sequence shown in Fig. 6) was identified by a comparison to the partial amino acid sequence (3) and the complete amino acid sequence (6) of the NH, terminus of glycinin A3 subunit. Preceding this isoleucine residue is a 24amino acid segment, beginning with methionine, which is rich in hydrophobic amino acids and contains a basic amino acid such as lysine prior to this hydrophobic sequence. This amino acid segment may function as a signal peptide, characteristic for secreted proteins (22), which is probably removed during co-translational processing (23). The 5"terminus of glycinin A3B4 subunit mRNA also contains an untranslated region preceding the initiator methionine (Fig. 6), of which 46 nucleotides are contained within the pGA3B4 1425 ds-cDNA insert. The possibility that glycinin A3B4 subunit mRNA may contain additional noncoding nucleotides at the extreme 5'end was not further investigated.
Analysis of the coding region of this glycinin subunit mRNA indicates that the glycinin A3B4 subunit pair is synthesized from the messenger RNA encoding the A3 acidic subunit region corresponding to 320 amino acids, with a calculated molecular weight of 36,392, followed by the B4 basic subunit region corresponding to 172 amino acids, with a calculated molecular weight of 19,072, as the NHz-terminal sequence of B4 subunit deduced from the nucleotide sequence was in close agreement with the partial amino acid sequence reported previously by Tumer et al. (3). Fig. 6 also confirms that pairing between acidic and basic subunits of glycinin is nonrandom as suggested previously (5) and completely directed in the mRNA sequence and that glycinin synthesized as a precursor undergoes post-translational processing to form the specific polypeptide pair via a disulfide bond.
To investigate the existence of the linker sequence joining the acidic and the basic subunits, protein sequencing of the COOH termini of the A3 and B4 subunits was perf~rmed.~ The results indicated that the COOH termini of the A3 and B4 subunits were Asn and Pro, respectively. Thus, there is no evidence for another processing event besides the co-translational processing and the above post-translational processing steps. This evidence is in agreement with that from pea legumin (24).
There is a slight preferential use of particular codons specifying certain amino acids in the glycinin A3 and B4 subunits (data not shown). Further analysis of the glycinin A3B4 subunit mRNA nucleotide sequence with the aid of specific computer programs did not reveal any additional distinctive features. Application of the Chou and Fasman algorithm (25,26) for predicting secondary structure indicates that both A, and B4 subunits have a low degree of ordered structure such C. Fukazawa, unpublished results.

Glycinin
Seed Storage Protein mRNA 6239 as a-helix and a relatively high degree of p-structure. These predicted secondary structures are in agreement with those deduced by optical rotatory dispersion and circular dichroism procedures (31-33).
The A3 acidic component of the A3B4 subunit contained 17% acidic residues, 16% amide residues, and 10% basic residues, whereas the B4 basic component contained 7, 15, and 11%, respectively. These acidic and basic amino acid residues were dispersed randomly throughout both molecules. To investigate the hydropathic character, both subunits were analyzed with a computer program (30). Two strings of consecutive hydrophilic residues were found at positions 254 to 265 and 281 to 293.
It is of interest to know the mechanism of the post-translational cleavage of the glycinin A3B4 intermediary precursor into A3 and B4 subunits. The amino acid sequence around the cleavage site, Gly-Cys-Gln-Thr-Arg-AsniGly-Val-Glu-Glu-Asn-Ile (arrow indicates the cleavage site), was also applied to the analysis by the Chou and Fasman algorithm. The result suggests that the COOH-terminal region of the AS subunit may have 0-structure, while the NH2-terminal region of B4 subunit is an a-helix. As the predicted A3B4 subunit precursor contained two additional Arg-Asn-Gly segments (Fig. 6), the predicted secondary structure of those amino acid sequences around the characteristic segment were analyzed by the same procedure described above. The results indicated that both amino acid sequences were predicted to have only /3-structure. Therefore, it could be speculated that the maturation enzyme which cleaves the glycinin A-B subunit complex into acidic (A) and basic (B) subunits may recognize the specific pstructurea-helix segment which has the unique Arg-Asn-Gly sequence, and cleave it at the site between Asn and Gly.
A comparison of the inferred amino acid sequence of glycinin A3B4 subunit with that of pea legumin, the pea storage protein which is analogous to glycinin, is shown in Fig. 7. In order to achieve maximum alignment of the amino acid positions between the two proteins, several hypothetical deletions were introduced. It was found that overall, 42% of the amino acid positions in the basic component of the two proteins are identical. Relatively long regions of non-homology could be identified at the COOH termini of the both subunits, whereas relatively short regions of non-homology could be identified near the middle of those proteins.
These results indicate that both storage proteins have a common ancestor and gene rearrangement has occurred on evolution.