The complete primary structure of the cellular retinaldehyde-binding protein from bovine retina.

Cellular retinaldehyde-binding protein (CRALBP) carries 11-cis-retinol and 11-cis-retinaldehyde as endogenous ligands and may be a functional component of the visual cycle. The complete amino acid sequence of CRALBP from bovine retina has been determined by direct microanalysis of the protein. Bovine CRALBP contains 316 residues in a single amino-terminal-blocked chain corresponding to a molecular weight of 36,421, inclusive of the blocking group. Overlapping peptides were generated by cleavage of lysyl, arginyl, methionyl, glutamyl, and one tryptophanyl bond and sequenced by gas-phase Edman degradation. Analysis of amino-terminal arginyl and methionyl peptides by fast atom bombardment mass spectrometry identified the N alpha-blocking group as an acetyl moiety, and tandem mass spectrometry provided the sequence of the first 9 residues. Comparison of CRALBP with other known protein sequences reveals no significant structural relatedness. The present results provide a basis for relating CRALBP domains with physiological function and for the future development of a more detailed three-dimensional model of the interaction of 11-cis-retinaldehyde with protein.

a functional component of the visual cycle, perhaps serving as a stereoselective agent and/or substrate carrier protein (7, 8). As part of our continuing study of the role of retinoidbinding proteins in the physiology of the retina, we have determined the primary structure of bovine retinal CRALBP. NHZ-terminal analysis of CRALBP is hindered by the presence of a blocking group (4). A combination of fast atom ~m b a r d m e n t (FAB-MS) and tandem mass spectrometry (MS/MS) coupled with amino acid analysis has provided the identity of the Ne-blocking group and NHe-terminal sequence. The strategy used in the direct microanalysis of the protein and evidence in support of the complete amino acid sequence of bovine CRALBP are presented here.

RESULTS
The strategy used to determine the complete amino acid sequence of CRALBP from bovine retina is shown in Fig. 1. The single amino-terminally acetylated polypeptide chain contains 316 residues, corresponding to a molecular weight of 36,421 (including the Nu-acetyl group), about 10% larger than the 33,000 determined by sodium dodecyl sulfate-polyacrylamide gel electrophoresis (19). The CRALBP sequence is in agreement with the amino acid composition determined exp e r i m e n~~~y ( Table I). The majority of the sequence of bovine CRALBP was determined by Edman degradation of peptides generated by cleavage at lysyl residues. The alignment of the lysyl peptides was obtained by cleavage of CRALBP at methionyl, arginyl, and glutamyl bonds and by one t~t o p h a n y l peptide generated during cyanogen bromide cleavage. The Ne-acetyl blocking group and the sequence of the first 9 residues were determined by mass spectral analysis of the NH*-terminal arginyl and methionyl peptides. The COOH terminus was identified both by carboxypeptidase Y digestion and by Edman degradation of lysyl and arginyl peptides lacking a COOH-terminal lysine or arginine, respectively.
Lysyl Peptides-The sequence of 281 residues was determined by Edman degradation of 13 peptides that were generated by treatment of pyridylethyl CRALBP with endoproteinase Lys-C and purified by RP-HPLC (Fig. 2). LYSYI peptides are numbered K1, K2, etc. from the amino terminus except for peptides containing uncleaved lysyl peptide bonds (eg. K3/5 and K12/13). Purification yields and amino acid compositions are shown in Table 11. Sequence analyses of peptides from another endoproteinase Lys-C digest (Fig. 3 ) of Portions of this paper (including "Experimental Procedures," Figs. 2-11, and Tables I-IV) are presented in miniprint at the end of this paper. Miniprint is easily read with the aid of a standard magnifying glass. Full size photocopies are included in the microfilm edition of the Journal that is available from Waverly Press.

FIG.
1. Summary of the proof of the sequence of bovine CRALBP. The determined sequences of specific peptides are indicated by a solid line. Prefixes K, R, E, M , and CB W denote peptides generated by cleavage at lysyl, arginyl, glutamyl, methionyl, and tryptophanyl residues, respectively. Peptides are numbered sequentially from the amino terminus except where an uncleaved residue gives an overlap (e.g. K3/5). All peptide sequences were proven by Edman degradation except R1 and MI where tandem mass spectrometry was used. Ac denotes an acetyl group that was identified by FAB-MS.
Arginyl Peptides-Succinylated and pyridylethylated CRALBP was digested with trypsin and peptides purified by RP-HPLC (Fig. 4). Several of the arginyl peptides used in the proof of the CRALBP structure required rechromatography before unambi~ous sequences could be determined (Fig. 5). Purification yields and amino acid compositions are shown in Table 111. The sequence of 188 residues was determined by analysis of 15 arginyl peptides (Fig. l), and overlaps were obtained between K2 through K6 and between K15 through K17.
Cyanogen Bromide Peptides-Succinylated and pyridylethylated CRALBP was cleaved at methionyl residues with cyanogen bromide, and peptides were purified by RP-HPLC at neutral pH (Fig. 6). Except for the tripeptide M5, the other five cyanogen bromide peptides were recovered, in addition to a tryptophanyl peptide (labeled CBW) generated during the cleavage. Peptides M2 and M3 required rechromatography prior to sequence analysis (Fig. 6, inset). Purification yields and amino acid compositions of selected methionyl peptides are shown in Table IV. The sequence of 109 residues was determined from analyses of the cyanogen bromide peptides, yielding strong overlaps between R2 through R4 and K2, between K8 and K9, and between K11 and K13. In addition, CBW linked K12/13 to K15 by a 2-residue overlap.
GlutamyE Peptides-To obtain the final overlaps, succinylated and carboxymethylated CRALBP was digested with Staphylococcus aureus V8 protease and the peptides purified by RP-HPLC (Fig. 7). Glutamyl peptides E18 and E31 required rechromatography (Fig. 8, top and middle) in order to obtain unambiguous sequence results. Amino acid compositions and yields of selected glutamyl peptides are shown in Table IV. The sequence of 85 residues was determined from four glutamyl peptides (Fig. l), yielding strong overlaps between M1 and M2 and R1 and R2/3, between K5 and K7, between K7 and K8 and between K9 and K11. Glutamyl peptide E2/5 was also isolated (Fig. 8, bottom) and sequenced from a V8 protease subfragmentation of lysyl peptide K1.
Identification of the NHz-terminal Structure-FAB-MS revealed the chemical nature of the Ne-blocking group and corroborated the composition of arginyl peptide Rl. The FAB mass spectrum of 400 pmol of the blocked peptide R1 exhibited two sets of (M+H)+/(~+Na)+ peaks at m/z 7431765 and 866/888, respectively (Fig. 9A). No sequence-informative fragments were observed. The arithmetical difference between the intense signal at m/z 866 and the molecular weight value 824 calculated for a peptide with the amino acid composition of R1 (Table I) strongly suggested R I to be an acetyl-blocked octapeptide with a composition of Glu,Ser,Glyn,Arg, Thr,Ala,Phe. The signal at m/z 743 is apparently unrelated to the blocked peptide and of unknown origin.
The sequence of the 8-residue blocked R1 peptide was determined by MS,MS using a VG ZAB SE-4F four-sector magnetic deflection mass spectrometer (10,15). The first mass analyzer selected the (M+H)' = 866.4 parent ion produced by FAB ionization of 500 pmol of arginyl peptide R1. The daughter ion spectrum generated in the second mass analyzer (Fig. 10) exhibited a complete series of sequence fragments originating from the carboxyl terminus ( W,, X,, Y,, and 2,) and extending to the acetylated amino terminus. In addition, a complete series of acylium fragments (B,) from the amino terminus is also present. These fragments establish the sequence of the first 8 amino acids of R l to be acetyl-Ser-Glu-Gly-Ala-Gly-Thr-Phe-Arg and demonstrate that the acetyl group is amide-linked to the amino terminus. Other possible arrangements of the residues were ruled out by the absence of one or more of the expected fragment ions in the daughter ion spectrum.
Mass analysis of blocked methionyl peptide M1 (500 pmol) corroborated the results obtained with the blocked arginyl peptide. The FAB mass spectrum of M1 (Fig. 9B) exhibits two molecular weight-related clusters at m/z 1010 and 1038. These correspond respectively to the (M+H)' for a peptide molecular weight of 1009 and the (M+H)+ for a formylated peptide of the same composition. The latter are common in the spectra of peptides obtained from CNBr digests employing formic acid as the solvent. The observed molecular mass of 1009 was 61 daltons more than the 948 expected for the 9residue NH2-terminal acetyl, COOH-terminal homoserine peptide. Apparently ethanolamine (molecular weight 61) added to the dry CNBr peptides, improving peptide solubility and RP-HPLC separation (11), and reacted with the COOHterminal homoserine lactone to yield homoserine ethanolamide and higher molecular weight compounds. The sequence of M1 was determined by MS/MS. Again, a complete NH2terminal B series was obtained with masses identical to those observed for peptide R1 with the addition of an intense fragment corresponding to B8 (spectrum not shown). Similarly, the series of COOH-terminal fragments , W, X, Y, and Z were all observed offset to higher mass by 144 daltons, the incremental mass of the homoserine ethanolamide residue. These data establish the sequence of the NH2-terminal CNBr fragment of CRALBP as acetyl-Ser-Glu-Gly-Ala-Gly-Thr-Phe-Arg-Met.
Identificution of the COOH ~erm~n~-~arboxypeptidase Y digestion of intact pyridylethylated CRALBP suggested the COOH-terminal sequence to be . . .Thr-Ala-Phe-COOH (Fig.   11). Only one lysyl peptide lacking lysine (K17) was isolated, and this peptide was recovered in high yield (about 62%); its COOH-terminal sequence, as determined by Edman degradation, agrees with the carboxypeptidase Y-deduced sequence. Edman degradation of arginyl peptide R22 also supports the COOH-terminal sequence of CRALBP to be . . .Thr-Ala-Phe-COOH. Peptide R22 was isolated as K17/R22 apparently due to both incomplete succinylation and incomplete tryptic digestion.

DISCUSSION
Direct proof of the complete structure of CRALBP from bovine retina was derived from overlapping peptides generated by cleavage at lysyl, arginyl, methionyl, and glutamyl residues. One t r~t o p h a n y l peptide produced during cyanogen bromide cleavage was also utilized in establishing an overlap. Fast atom bombardment and tandem mass spectrometry provided the data necessary to assign the first 9 residues and to determine that the protein is N"-acetyl blocked. These results highlight the sensitivity (picomole level) and efficacy of mass spectrometry in the microcharacterization of post-translationally modified proteins (15). Eighty-nine percent of the 316 amino acid residues in the protein were identified in more than one peptide. The weakest parts of the direct structural determination were the variable p~e n y l t h i~a r b~y 1 amino acid analyses and the 2-residue overlap between peptides CBW and K15. The complete sequence was independently confirmed, however, by analysis of the cDNA encoding bovine CRALBP (21).
Compared with other protein sequences, CRALBP belongs neither to the superfamily of lipophile-binding proteins that includes cellular retinol-binding protein, cellular retinoic acidbinding protein, peripheral nerve myelin P2 protein, and several fatty acid-binding proteins (22), nor to the superfamily that includes serum retinol-binding protein (23). Furthermore, CRALBP is not related to other retinoid-binding proteins such as i n t e~h o t o~c e p~r retinoid-binding protein (24), rhodopsin, and the cone visual pigments (25). In fact, CRALBP exhibits no structural relatedness with any other presently known protein sequence in the Protein Identification Resource data base. Given the finite number of proteins and the ever accelerating rate at which new protein sequences are determinedldeduced, it is likely that a protein superfamily including CRALBP will become evident in the not distant future (26).
No information is yet available concerning the retinoidbinding site in CRALBP, though most likely the water-insoluble ligand is sequestered within a hydrophobic domain. Hydropathy in CRALBP estimated according to Hopp and Woods (17) agrees in general with that predicted by Kyte and Doolittle (16); however, the strongest uninterrupted hydrophobic region is predicted by Hopp and Woods and falls between residues 238 and 250 (21). Secondary structure predictions suggest that this hydrophobic domain contains more @-sheet than a-helix (21). Crystallization of CRALBP with bound 11-cis-retinaldehyde and analysis by x-ray diffraction could lead to the identification of the retinoid-binding site. Recent work suggests a common structural motif in three proteins of the retinol-binding protein superfamily (27). Compared with rhodopsin and other membrane-bound visual pigments, the water solubility of the CRALBP-retinoid complex indeed renders it a reasonable candidate for crystallization. In this regard, CRALBP may provide a useful three-dimensional structural model for the interaction of ll-cis-retinaldehyde with protein.
The physiological role of CRALBP is not yet fulIy understood. However, several lines of evidence suggest that the protein is a functional component of the visual cycle. First, CRALBP has only been found in retina and pineal, both lightsensitive tissues (1). Second, the endogenous ligands bound by CRALBP are 11-cis-retinol and 11-cis-retinaldehyde, retinoids that are only known to function in the visual process.
Third, CRALBP interacts specifically with the visual cycle enzymes 11-cis-retinol dehydrogenase (7) of retinal pigment epithelium, catalyzing the reduction of 11-cis-retinaldehyde bound to CRALBP, and retinyl ester synthase, yielding retinyl esters (28). Finally, CRALBP is able to select ll-cis-retinaldehyde from a mixture of vitamin A stereoisomers and, relative to rhodopsin, protect it from photoisomerization, suggesting a role for the protein in the generation of ll-cisretinoids (8).