Chicken riboflavin-binding protein. cDNA sequence and homology with milk folate-binding protein.

The Rd gene is expressed in the livers and oviducts of laying hens and codes for the riboflavin-binding protein (RfBP) of egg yolk and egg white. A lambda gt11 cDNA library derived from chicken oviduct poly(A)+ RNA was screened with polyclonal rabbit antiserum to chicken RfBP. Positive clones were isolated and rescreened with a mixed oligonucleotide probe corresponding to residues 20-25 of the mature protein. The largest cDNA clone (969 base pairs) was subcloned into plasmid pIBI21, and the nucleotide sequence was determined by the dideoxynucleotide method. This clone contained the entire coding region for RfBP. The published amino acid sequence of the mature protein was confirmed. In addition, the following 17-residue signal peptide was deduced: Met-Leu-Arg-Phe-Ala-Ile-Thr-Leu-Phe-Ala-Val-Ile-Thr-Ser-Ser-Thr-Cys. Unexpectedly, the nucleotide sequence codes for 2 adjacent arginine residues at the carboxyl terminus that are not observed in the mature protein. The amino acid sequence of RfBP is homologous with bovine milk folate-binding protein. Eight of the nine pairs of cysteines involved in disulfide bonds in RfBP are conserved in folate-binding protein, as are all of the tryptophan residues. Sequence identity between homologous regions of these two vitamin-binding proteins is more than 30%.

The Rd gene is expressed in the livers and oviducts of laying hens and codes for the riboflavin-binding protein (RfBP) of egg yolk and egg white. A Xgtll cDNA library derived from chicken oviduct poly(A)+ RNA was screened with polyclonal rabbit antiserum to chicken RfBP. Positive clones were isolated and rescreened with a mixed oligonucleotide probe corresponding to residues 20-25 of the mature protein. The largest cDNA clone (969 base pairs) was subcloned into plasmid pIBI21, and the nucleotide sequence was determined by the dideoxynucleotide method. This clone contained the entire coding region for RfBP. The published amino acid sequence of the mature protein was confirmed. In addition, the following 17-residue signal peptide was deduced: Met-Leu-Arg-Phe-Ala-Ile-Thr-Leu-Phe-Ala-Val-Ile-Thr-Ser-Ser-Thr-Cys. Unexpectedly, the nucleotide sequence codes for 2 adjacent arginine residues at the carboxyl terminus that are not observed in the mature protein. The amino acid sequence of RfBP is homologous with bovine milk folate-binding protein. Eight of the nine pairs of cysteines involved in disulfide bonds in RfBP are conserved in folate-binding protein, as are all of the tryptophan residues. Sequence identity between homologous regions of these two vitamin-binding proteins is more than 30%. ___ ~_ _ _ ___ ~_ _ _ ~_ _ _ _ The faint yellow color of normal egg white is due to riboflavin bound to a specific riboflavin-binding protein (1). This protein (RfBP),' constituting 0.8% of egg white protein (2), is also present in egg yolk (3,4) and in the plasma of laying hens (5). Hens homozygous for the recessive rd allele are unable to produce RfBP and lay eggs deficient in riboflavin (6). Embryos in these eggs die at or about 13 days of development (7).
Mature RfBP is the product of a number of post-translational modifications. In addition to the cleavage of a heretofore uncharacterized signal peptide, its amino terminus is blocked by pyroglutamic acid (8), and an acidic carboxyl-* This work was supported by the University of Delaware Research Foundation, a National Institutes of Health Biomedical Research Support Group grant, and Abbott Laboratories. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

to the GenBankTM/EMBL Data Bank with accession numberfs)
The nucleotide sequencefs) reported in thispaper has been submitted 503922.
B T o whom correspondence should be addressed. The abbreviations used are: RfBP, riboflavin-binding protein; FBP, folate-binding protein.
terminal 11-13-residue peptide is removed during or after transport to the yolk (9). There are nine disulfide bonds whose positions have been determined (lo), two complex, N-linked oligosaccharides whose composition depends on the tissue of synthesis (5,8), and a serine-rich region containing eight phosphoryl groups (11)(12)(13)(14). These phosphoryl groups are necessary for the transport of serum RfBP into the oocyte (15). The mature 219-amino acid sequence has been determined (8,9). Crystals have been obtained, and structural analysis is in progress (16).
RfBP is unusual among egg proteins because it is present in both egg yolk and egg white (1,17). This implies that the estrogen-dependent expression of RfBP is distinct from that of other egg proteins such as egg white ovalbumin, synthesized in the oviduct (18,19), or vitellogenin, synthesized in the liver (20).
Homologous riboflavin-binding proteins are present in the eggs of reptiles (21) and other birds (22-24) and in the plasma of pregnant mammals (24,25). In the later case, RfBP has been shown to be necessary for the transfer of riboflavin from mother to fetus. Immature female rats immunized with chicken RfBP subsequently develop autoimmunity to their pregnancy-specific RfBP and spontaneously abort (26).
As a first step toward understanding the transcriptional regulation of the Rd gene, determining the molecular basis of the null allele, and providing a probe for isolating the mammalian gene, a cDNA clone coding for chicken RfBP has been isolated, and its nucleotide sequence was determined.

MATERIALS AND METHODS
cDNA Libray-The cDNA library was obtained from Dr. Bert W. O'Malley's laboratory at Baylor Medical School (Houston, TX). The construction of this library in Xgtll from chick oviduct poly(A)+ RNA has been described (27).
Screening for RfBP-producing Clones-Propagation of recombinant Xgtll was conducted in Escherichia coli strain y1090(r-) (28) obtained from Promega Biotec (Madison, WI). Clones were screened with the Photoblot Immunoscreening System by the method of the supplier (46). Rabbit antibodies to chicken RfBP were obtained from Dr. Robert Guyer (Pennsylvania State University, University Park, PA). Clones expressing RfBP antigens were isolated, and the recombinant Xgtll DNA was purified (29). The sizes of the cDNA inserts were determined by gel electrophoresis of EcoRI-digested DNA in 1% Sea Kem GTG-agarose gels (FMC Bioproducts, Rockland, ME). The largest inserts were rescreened for hybridization with a mixed oligonucleotide probe (30), synthesized on an Systec Microsyn DNA synthesizer. The probe was chosen to be complementary to a region with low degeneracy near the amino terminus of RfBP (residues [20][21][22][23][24][25]. DNA Sequence Determination-The clone containing the largest insert (-1 kilobase pair) and also hybridizing with the oligonucleotide probe was selected for sequencing. The cDNA was excised from Xgtll and subcloned (31) into the EcoRI site of pIBI21. DNA from recombinant plasmids was isolated (32) and used as a template in the DNA sequencing reaction (32,33). The M13 17-mer reverse primer (pur-  chased from Pharmacia LKB Biotechnology Inc.) was used to obtain the nucleotide sequence of the 5' end of the gene. Subsequent sequencing reactions were positioned by primers synthesized to correspond to distal regions of the sequence determined in the preceding sequence reaction. The nucleotide sequences of both strands were determined using a total of nine different primers.

RESULTS AND DISCUSSION
Propagation and Detection of Rd cDNA Clones-Riboflavinbinding protein is one of a number of proteins in egg white that have or may have antimicrobial activity (34). Consequently, production of significant amounts of RfBP in E. coli could be lethal or bacteriostatic because the protein could scavenge exogenous and de novo synthesized riboflavin. To avoid this potential selection against the clones of interest, the culture media for propagation and screening were supplemented with riboflavin. This precaution is analogous to that used in screening avidin-producing clones from the same cDNA library by Gope et al. (27).
A total of 30 clones were independently isolated using a polyclonal rabbit antiserum to chicken RfBP. All but four of these Xgtll clones contained cDNA inserts of fewer than 700 base pairs and thus were not large enough to encode the 219 amino acids in RfBP and its expected signal peptide. Only the largest of the four remaining clones hybridized with a mixed nucleotide probe corresponding to residues 20-25 of the mature protein. The nucleotide sequence of this clone included the entire coding region of the Rd gene.
This full-length Rd cDNA subcloned in an expression vector did not impair the growth of host cells on unsupplemented media, suggesting that the expressed fusion protein was produced in very small amounts or was unable to bind riboflavin. This conclusion is consistent with the fact that RfJ3P is a secreted protein with many disulfide bonds. Furthermore, its expression in chicken involves a number of other post-translational modifications unexpected in an intracellular fusion protein in E. coli. The ocher termination codon (TAG) shortly following the 5'-EcoRI site (positions -54 to -52) suggests that no fusion protein would be produced. However, screening with antiserum indicates that fusion protein was produced perhaps in small quantities as a result of the amber-ocher suppressor in the E. coli y1090 host strain.

Riboflavin-binding Protein
Nucleotide Sequence Determination-The nucleotide sequence of the Rd cDNA was determined by the dideoxynucleotide method using oligonucleotide primers that successively extended the sequence. Both strands of the cDNA were sequenced. In the region coding for the signal peptide, the nucleotide sequence was difficult to determine. This problem was overcome, and the nucleotide sequence was confirmed by using a primer that abutted this region.
The nucleotide sequence (Fig. 1) is in complete agreement with the amino acid sequence determined by Hamazume et aZ. (8). The polymorphism they observed in the protein sequence at residue 14 indicated that the predominant allele ccjded for lysine at that position. This is the allele we have isolated. A 17-amino acid signal peptide is predicted by the nucleotide sequence. The predicted site for signal peptide cleavage precedes adjacent glutamine residues. The resulting amino-terminal glutamine must cyclize, eliminating ammonia, to produce the pyroglutamate found in the mature protein.
Although the published amino acid sequence of RfBP (8,9) agrees with the nucleotide sequence reported here, the nucleotide sequence indicates that the initial translation product contains adjacent arginine residues at the carboxyl terminus. These are not observed in the mature protein; and thus, another peptide processing event must occur in addition to those mentioned in the Introduction. A similar observation has been made with the cDNA sequence for the matrix ycarboxyglutamic acid containing protein of bone, dentin, and cartilage, where a carboxyl-terminal Arg-Arg-Gly-Ala-Lys sequence is predicted from the nucleotide sequence, but is not observed in the mature protein (35).
Recent analysis of the nucleotide sequences flanking the ATG initiation codon of almost 700 vertebrate cDNAs reveals the consensus sequence GCCGCCPuCCQG for initiation of translation (36). Other than the ATG initiation site, the Rd cDNA matches this sequence at positions -2, -3, -5, and -9. This 5"untranslated region of the Rd cDNA is distinctive in having a polypurine tract just upstream of the initiation codon (positions -25 to -6). Another polypurine tract occurs immediately preceding the termination codon (positions 692-714). Higher than expected frequencies of polypurine tracts of this size and greater have been found in the human Bglobin region and several other human gene sequences (37). The significance of these regions is unclear.
The 3'-untranslated region of Rd cDNA is 120 nucleotides long, with the AATAAA consensus sequence for polyadenylation at positions 814-819 (38). The poly(A) tract in this clone contains 62 adenyl residues.
Homology with Milk Folate-binding Protein-A search of a protein sequence data bank by Dr. Russell Doolittie (University of California, San Diego, CA) revealed a significant similarity of RfBP to a fragment of bovine milk folate-binding protein (FBP) (39). Subsequent comparison with the completed FBP sequence (40) has extended the region of apparent homology. Alignment to maximize identical residues in the two sequences indicates that RfBP has a carboxyl-terminal extension relative to FBP, whereas FBP is extended at the amino terminus (Fig. 2). In the region from residues 5 to 172, where the sequence clearly corresponds to that of FBP, there are several small insertions or deletions. Sequence identity in this region is more than 30%, a value that greatly exceeds the null hypothesis that the identities are due to chance. The presumption of homology is supported by structural and functional considerations. RfBP has nine disulfide bonds. The specific cysteines involved in each disulfide have been determined (10). All but one of these pairs of cysteine residues are conserved in FBP, suggesting that the disulfide bonds in the two proteins are probably the same. Tryptophan residues have been implicated in riboflavin binding to RfBP (41). All 6 of the tryptophan residues in RfBP are conserved in FBP. One of the two N-linked oligosaccharides in each protein occurs at a conserved asparagine residue. These conserved residues account for 23 of the 59 sequence identities between the two vitamin-binding proteins.
Notably absent in the FBP sequence is the highly anionic region at the carboxyl terminus of RfBP. In RfBP, this region includes a 14-residue sequence that contains 8 phosphorylserine and 5 glutamic acid residues (11,14). Whereas this region is not found in milk FBP, similar sequences are well known in caseins (42), the major proteins of milk. Whether RfBP acquired this phosphorylated region or FBP lost it in their respective evolution from a common ancestral protein may become evident when the structure of chicken egg FBP is determined. This recently discovered vitamin-binding protein2 is one of a number of nutrient-transport proteins now known from chicken egg (43).
The recognition of homology between RfBP and FBP establishes another family of high affinity, soluble, transport proteins. Previously, it has been shown that vitamin D-binding protein is homologous with albumen and a-fetoprotein (44) and that retinol-binding is homologous with bilin-binding protein and milk P-lactoglobin (45). Whereas folates and flavins are functionally distinct in their catalytic function in cells, they do have structural resemblance in the pterin ring system they share. Thus, it is conceivable that an ancestral protein had the capacity to bind both vitamins or that a few amino acid replacements could interconvert the binding specificities.