The amino acid sequence of a testis-specific basic protein that is associated with spermatogenesis.

The amino acid sequence of the COOH-terminal cyanogen bromide fragment (residues 12 to 54) of the testis-specific basic protein of the rat has been determined. This analysis completes the primary structure of the whole protein by over-lapping the sequence of the 23 residues from the NH-2 terminus previously published (Kistler, W. S., Noyes, C., and Heinrikson, R.L. (1974) Biochem. Biophys. Res. Commun. 57, 341-347). The complete sequence of this small, highly basic protein is: (see article for formular).


57, 341-347).
The complete sequence of this small, highly basic protein is: NH,-Ser-Thr-Ser-Arg-Lys-Leu-Lys-Thr-5 His-Gly-Met-Arg-Arg-Gly-Lys-Asn-Arg-Ala-Pro-His-Lys- 10 15 20 Gly-Val-Lys-Arg-Gly-Gly-Ser-Lys-Arg-Lys-Tyr-Arg-Lys-25 30 Ser-Ser-Leu-Lys-Ser-Arg-Lys-Arg-Gly-Asp-Ser-Ala-Asp-35 40 45 Arg-Asn-Tyr-Arg-Ser-His-Leu-COOH. 50 The adult testis of a number of mammalian species, including man, contains a small and highly basic protein of markedly restricted amino acid composition (1,2). This protein is readily extracted in soluble form by homogenization of the testis in dilute mineral acid (0.2 M HpSOq) and may be identified after further treatment of such extracts as a discreet stained band resulting from polyacrylamide gel electrophoresis under appropriate conditions (1,2). In the rat this protein could not be detected in extracts prepared from a large number of other organs, and * This work was supported by Grants HD-04592 and HI>-07110 from the United States Public Health Service. bv Grant GB-29098 from the National Science Foundation, and by G&t IN-41N from the American Cancer Society. therefore, it has been assigned the provisional designation "testisspecific" (1,3). The testis-specific basic protein has been purified from both rat and marl, and in each case it was found to be rich in arginine, lysinc, h yc (rl . ine, and scriric, but to lack six of the amino acids commonly found in proteins, namely, glutamic acid, glutamiuc, cystcine, isolcucinc, phcriylalnuinc, and tryptopharr (1,2).
Although the function of this testicular protein is not hewn, several lines of evidence indicate that it is associated with the spermatogeriic fuuctiorr of the male gonad rather than its role as an endocrine organ. l%ricfly, under a variety of conditions the presence of the testis-specific basic protein corrclatcs with the occurrcncc of developing haploid cell types (spcrmatids) in the seminiferous tubules rather than with the functional stntc of the androgcnsccrcting interstitial ~11s of Leydig (1,2). Despite this apparent association of the testis-specific basic protein with the development of spcrmatids, in the rat this protein was undetectable in cpididymal spermatozoa, the end product of spermatid maturation (1). Its abscncc from spermatozoa and its distinctive amino acid composition distinguish it from another class of small arid basic protcius associated with spermatogcriesis, the basic chromosomal proteins found in association with the I)NA of sperm headpieces (I, [4][5][6].
Because of the wide occurrence of this testicular protein among placental mamrnals and because of its possible utility as a specific gene product for the study of spermatogenesis, we have been prompted to determine its primary structure.
In a previous report (3) the testis-specific basic protein of the rat was shown to consist of a polypeptidc chain of 54 residues with a calculated mass of 6200 daltons.
The first 23 residues from the NH, terminus were identified by automated Edman degradation (3). In addition it was shown that the protein can be clcavcd by cyanogen bromide at the single mcthionyl residue located at position 11 to yield two fragments that can be isolated in pure form by gel filtration (3). 111 the current paper we present the results of studies that allow the determination of the remainder of the sequence of the rat testis-specific basic protein. semi-micro procedure of Peterson et al. (18), and the liberated residues were identified as described above. Assignments of COOH-terminal sequences were made by the rate and stoichiometry of release of amino acids from peptide or protein samples during hydrolysis with carboxgpeptidase A. %kmyme crystals were washed with water and brought into solution bv method 1 of Ambler (19). Digestion was nerformed in 0.2 -M N-ethylmorpholine acetate adjusted to pH 8.5 or, when lysine was the anticipated COOH-terminal residue, in 0.3 M N-ethylmorpholine acetate at pH 9.0. Digestion was terminated by freezing followed by lyophilization.
Amino acids released were identified by applying samples of the digest dissolved in sodium citrate buffer directly to the amino acid analyzer.

RESULTS
In an earlier communication we reported the amino acid sequence of the first 23 residues of the testis-specific basic protein from the rat (3). Furthermore, it was shown that treatment of the intact protein with cyanogen bromide results in cleavage of the polypeptide chain at the single methionyl residue at position 11 to produce two fragments that are easily isolated in pure form by gel filtration (3). The remainder of the sequence of this 54residue protein has been elucidated by a combination of methods including digestion with carboxypeptidase A, isolation and analysis of thermolysin fragments, and automated Edman degradation of the COOH-terminal cyanogen bromide fragment (CNBr II) from this molecule.
Hydrolysis with Carboxypeptidase A-Digestion of the intact protein with carboxypeptidase A at 34" resulted in the rapid and nearly quantitative release of leucine (Fig. 1). Histidine and serine were released at an identical rate and considerably more slowly than leucine. No other amino acids were liberated over the course of a 2-hour incubation. A similar digestion carried out at 4" allowed more precise observation of the rate of release of leucine, but, again, the rates of appearance of histidine and serine were indistinguishable. The COOH-terminal structure of the intact protein was thus established as: -(Ser,His)-Leu-COOH.
Isolation and Analysis, of Thermolysin Fragments-Trial digestion of the intact protein with either trypsin, chymotrypsin, or thermolysin demonstrated that the protein is readily cleaved at numerous sites by all three proteolytic enzymes. Digestion with thermolysin gave rise to a reproducible set of fragments that appeared to map without overlap in the analytical system described under "Experimental Procedures" (Fig. 2). The At 120 min the yield of leucine was 9.3 nmol or 93%. Leucine, 0; histidine, A; serine, 0. 1849 mapping behavior of these fragments suggested that they could all be obtained in pure form by a two-step procedure involving paper chromatography followed by high voltage electrophoresis. Fragmentation with thermolysin was therefore carried out on a preparative scale, and indeed it was possible to obtain all of the principal peptides in good yield (Table I).
Five of these peptides (Th I, Th IT, Th IIA, Th IIB, Th III) appear in the sequence previously reported (3) for the NH2terminal 23 residues (cf. Fig. 6). The position of Th IV was readily established, since it contained the single valine in the protein, a residue already identified at position 23 (3). Of the three remaining peptides, only Th VII contained both a leucyl and a histidinyl residue (Table I), and in view of the carboxypeptidase A digestion described above, it was placed at the COOH terminus of the protein.
Partial Sequence Analysis of CNBr II-Automated Edman degradation of CNBr II (490 nmol) resulted in unambiguous identification of each residue liberated for the first 17 cycles. Due primarily to incomplete coupling and/or cleavage of the proline encountered on the eighth cycle, a severe overlap problem developed during the subsequent degradation. Hermodson et al. (14) have noted the generation of overlap accompanying automated Edman degradation of prolyl residues, and they found a satisfactory solution to consist of a temperature increase during the coupling phase of the Edman cycle. Accordingly, a second run was made with CNBr II (656 nmol), and the temperature for the coupling phase of cycle 8 was raised from the usual 52" to 58". Unfortunately, this procedure did not increase the recovery of proline on the correct cycle, and the ensuing overlap was as severe as that encountered previously.
Because of the overlap problem described, it was generally impossible to place with confidence such recurrent components of this protein as lysine, arginine, and serine after the 17th cycle. Conversely, despite large gaps in the sequence analysis, the    identification of certain amino acids could still bc made with a high degree of confidence until the 39th cycle. The partial sequence thus gcncratetl is given in Fig. 3, and the quantitative nature of the degradation process, as illustrated by the recovery of representative residues, is depicted in Fig. 4. The sudden introduction of overlap at the proline encountered on the 8th cycle is evident, and the gradual development of this overlap to the point where maximum recovery of a given residue occurred we or even two cycles late can also be seen. Despite the recovery of these latter residues as a broad peak encompassing several cycles, it is clear that the correct assignment of the residues indicated in Big. 4 most probably corresponds to the cycle on which the amino acid in question first appeared significantly above its background level. In this manner it was possible to place the remaining unassigned alanyl residue at position 46 of the intact protein and to document that the first of three unassigned Asx residues occurred at position 44 of the intact protein, with the remaining two necessarily located distal to that point. '1 he location of tyrosyl residues at positions 32 and 50 of the intact protein completed a skeletal outline of sufficient detail to align thermolysin peptides Th V and Th VI on the basis of their compositions (Table 1) and to confirm the C'OOH-terminal location of fragment Th VII (cj. Fig. 6). In the region of mutual overlap, residues 12 to 23 of the intact protein, the sequence generated agrees exactly with that previously established for the NHz terminus of the intact protein (3). In addition, it was possible to identify residue 16 of t'he intact protein, previously designated ,4sx (3), as being asparagine.
Sequence of Tk IV-The first 7 residues of the g-member peptide were assigned by automated Edman degradation of CNI%r II (Fig. 3). From the composition (Table 1) a single lysyl and a single arginyl residue remained unplaced. The peptide (9.5 nmols) was digested with 6.3 pg of carbosypeptidase A at 38" for 12 hours. Lysine (7.4 nmols) was the only amino acid released. Thus lysine is the COOH-terminal residue leaving arginine as the penultimate residue.
Sequence of Th V-The peptide (20 nmol) was treated with 2.5 pg of carboxypeptidase A for 5 hours at 34". Serine (11.7 nmol) and lysine (3.2 nmol) were the only amino acids released. The peptide (500 nmol) was then subjected to four cycles of automated Kdman degradation.
Identification of the residue released in each case was made by direct hydrolysis of the thiazolinone derivatives with HI for conversion back to the parent amino acid. The following residues were identified : 'l'yr (162 nmol)-Arg (36 nmol)-Lys (120 nmol)-Ser (as alanine, 34 nmol), leaving serine as the W0H-terminal residue by difference (Table 1). This peptide therefore comprises residues 32 to 36 in the intact protein (cf. Fig. 6) since the tyrosyl and 1 seryl residue had been placed previously during the analysis of CNUr II (Fig. 3).
Sequence of Th V/--The tridecapeptide (6.5 nmol) was treated with 1.3 pg of carboxypeptidase A for 7 hours at 37". Approximately 3 nmol of either serine or asparagine, which coelute on the analyzer, were recovered.
Manual Edman degradation was performed on 300 nmol of this peptide, and the liberated residues were identified by a combination of the techniques described under "Experimental Yrocedures." The following sequence was established with reference to the known composition (Table I) during automated Edman degradation of CNBr II. The results plotted for the first 17 cycles are derived from quantitation of the phenylthiohydantoins detected by gas chromatography during the first of two runs with CNBr II as described in the text. On this run the repetitive yield, calculated on the basis of the total in and out of step recovery (15) of the alanine encountered on the 7th cycle, the valine encountered on the 12th cycle, and the leucine encountered on the 26th cycle, was on the order of 93 to 94%. The results for cycles 18 to 43, plotted with expanded scale on the ordinate, were derived from the second automated run by conversion of the liberated thiazolinone derivatives to the parent amino acids and subsequent analysis as described under "Experimental Procedures." In this case the results were normalized to the recovery of a standard amount of norleucine phenylthiohydantoin added to each sample prior to hydrolysis with HI. For the second run, the repetitive yield, calculated as described above, was on the order of 96 to 97%. The results are presented as a composite of the two runs because, owing to larger amounts of starting material and a better repetitive yield, the results from the second run are more decisive during the later part of the degradation. However, for the second run, only spot checks were made during the initial cycles so the quantitative record for the early part of the degradation was incomplete.
The urro~s indicate positional assignments. To obtain shorter fragments of this peptide, lysine residues were first rendered resistant to tryptic digestion by derivatization with citraconic anhydride.
After digestion of the citraconylated peptide, mapping of the tryptic fragments indicated that despite the 3 arginyl residues in this peptide, it had been split into only two fragments. These two peptides, T(c)1 and T(c)2, were separated by high voltage electrophoresis. The compositions of the two fragments presented in Table II indicated Table III. For further details see the text.
Following removal of the citraconyl blocking groups, further fragmentation of peptide T(c)2 was effected by treatment of 460 nmol of the peptide with 20 pg of trypsin in 1 ml of 0.1 M Nethylmorpholine acetate at pH 8 containing 2 mM CaClz for 16 hours at 34". Mapping of the digest delineated six ninhydrinpositive spots, two of which gave distinctive color reactions. An anionic peptide at pH 6.4 gave a yellow color suggesting an NHz-terminal glycyl residue while a neutral fragment gave a brown color identical with that displayed by free asparagine. Five of the six fragments were isolated by a combination of high voltage electrophoresis and paper chromatography. The compositional analyses of these fragments given in Table III allowed the determination of the complete sequence of T(c)2 (Fig. 5). Fragment T2 was identified as asparagine based on amino acid analysis before and after hydrolysis, thus confirming the preliminary assignment on the basis of ninhydrin color. In view of the specificity of trypsin and the composition of Tl in conjunction with the previously established partial sequence of Th VI, the COOH-terminal 3 residues of Th VI must be -Asp-Arg-Asn-. The identification of the 2 aspartyl residues in Tl is based on the electrophoretic mobility of peptide fragments Tl, T3, and T4. It is apparent that even after a prolonged digestion with trypsin, cleavage at the NHz-terminal Lys-Arg sequence is incomplete. To complete the recovery of fragments from this tryptic digestion, free lysine should be identified. In fact, the sixth ninhydrin-positive spot identified during mapping of the digest had the mobility expected for free lysine.
Sequence of Th VII-The COOH-terminal thermolysin peptide (300 nmol) was subjected to five cycles of manual Edman degradation. The product of each of the first three cycles was identified unambiguously to yield the partial sequence Tyr-Arg-Ser--. Since the COOH-terminal residue is known to be leucine (Fig. l), the location of histidine at the fourth position in the peptide may be made confidently by difference. Thus the sequence is Tyr-Arg-Ser-His-Leu.
Complete Sequence-The results of this investigation are 10 20 Ser-Thr-Ser-Arg-Lys-Leu-Lys-Thr-H~s-Gly-Met-Arg-Arg-Gly-Lys-Asn-Arg-Alo-Pro-H~s-Lys-Gly-Vol-Lys-Arg-Gly-Gly-  6. The complete sequence of the testis-specific basic protein from the rat. The location of all of the larger fragments of this protein produced in the course of the sequence analysis is shown. The nomenclature of these fragments is given under "Experimental Procedures." The sequence of the first 11 residues is taken from an earlier publication (3).
summarized in Fig. 6, which gives the complete sequence of the testis-specific basic protein of the rat.

DISCUSSION
In the sequence analysis of the rat testis-specific basic protein, automated Edman degradation of both the intact protein (3) and the large COOH-terminal cyanogen bromide fragment CNBr II provided a continuous sequence for over half the length of this small protein.
In addition, this technique scrvcd to identify a sufficient number of key residues in the COOHterminal region of the protein to permit alignment of all the peptides derived from the intact protein by thermolysin digestion. Because of the unusually restricted distribution of hydrophobic residues in this protein, thermolysin proved an ideal agent for its dissection, Quantitative cleavages were obtained at each internal hydrophobic residue. The sole exception to this pattern, partial cleavage between Thr 8 and His 9, is an example of the occasional susceptibility to thermolysin of the bonds on the amino-side of histidinyl residues (20,21). These thermolytic peptides were obtained in very high yields considering the use of paper methods during purification. This is probably due to the unusually hydrophilic character of these fragments.
The complete sequence of this protein (Fig. 6) reveals that both the numerous basic residues and the relatively less common hydrophobic residues are rather evenly distributed along the length of the molecule.
While there is a tendency for basic residues to occur in groups of two or three, the hydrophobic amino acids invariably occur alone, and, other than the COOI-Iterminal leucine, are followed by a basic residue. The 2 aspartyl residues, constituting the only acidic amino acids in the protein, both occur within the 4-residue sequence commencing at position 44. Their presence along with a relative scarcity of basic residues gives the COOH-terminal 10 or so residues a neutral character despite the overwhelming basicity of the remainder of the molecule. Also worthy of note is the existence of an internal homology between two pcntapeptide sequences, specifically residues 10 to 14 (Gly-Met-Arg-Brg-Gly) and residues 22 to 26 (Gly-Val-Lys-Arg-Gly).
The testis-specific basic protein has also been isolated in pure form from the human testis (2). Nothing is yet known of the sequence, but the amino acid composition is in accord with only two substitutions of a conservative nature differentiating the proteins from rat and man.
A protein of similar composition and size has been isolated from the testis of the mouse (22). In marked distinction to the testisspecific basic protein of rat (1) and man (2), the protein from mouse was claimed to be present in spermatozoa (22). Recent evidencei indicates that mouse spermatozoa, like those of several other eutherian mammals (1,5,6), contain a class of basic chromosomal proteins that is readily distinguishable from the testis-specific basic protein, both by a great excess of arginine over lysine and by a high content of half-cystine.
\vhilc it is possible that the association of the testis-specific basic protein with developing gametes may be transitory in some species (rat and man) but more permanent in others (mouse), the findings mentioned above indicate that the characterization of the basic protein from mouse testis and spermatozoa as "mouse protamine" (22) is perhaps premature.
Although nothing is yet known about the function of the testisspecific basic protein, recent experiments2 are in accord with the association of this protein with chromatin prepared from detergent-washed testicular nuclei. Since tlhe distinct possibility of a chromosomal origin is raised, it is appropriate to inquire whether the testis-specific basic protein bears a structural relationship to any of the well known basic chromosomal proteins of eucaryotic cells, the histones (23). Full sequences have now been determined for representatives of almost all of the established histone classes (24)(25)(26)(27)(28)(29).
Assuming that, rat histones, with perhaps the exception of some members of the lysinerich family, will be very nearly identical with those of other species, the esarnination of the published structures indicates that the rat testis-specific basic protein could not be derived frorn any of them by degradation. This finding confirms a conclusion drawn earlier (I) on the basis of less direct evidencf.
In addition, it appears that an evolutionary relationship, if any, between the testisspecific basic protein and one or another of the histoncs must be relatively distant.
The sequence of the testisspecific basic protein shares no substantial regions of overlap with any of them.