Characterization and Comparative Structural Features of the Gene for Human Interstitial Retinol-binding Protein*

We have cloned the gene for human interstitial reti- nol-binding protein (IRBP) and compared its nucleo- tide sequence with that of the corresponding cloned cDNA. The human IRBP gene is -9.5 kilobase pairs (kbp) in length and consists of four exons separated by three introns. The introns are 1.6-1.9 kbp long. The gene is transcribed by photoreceptor and retinoblas- toma cells into an -4.3-kilobase mRNA that is trans- lated and processed into a glycosylated protein of 135,000 Da. The amino acid sequence of human IRBP can be divided into four contiguous homology domains with 33-38% identity, suggesting a series of gene du- plication events. In the gene, the boundaries of these domains are not defined by exon-intron junctions, as might have been expected. The first and the are all by the first large exon, which is 3,180 pairs The of the is in the exons, which are base

We have cloned the gene for human interstitial retinol-binding protein (IRBP) and compared its nucleotide sequence with that of the corresponding cloned cDNA.
The human IRBP gene is -9.5 kilobase pairs (kbp) in length and consists of four exons separated by three introns.
The introns are 1.6-1.9 kbp long. The gene is transcribed by photoreceptor and retinoblastoma cells into an -4.3-kilobase mRNA that is translated and processed into a glycosylated protein of 135,000 Da. The amino acid sequence of human IRBP can be divided into four contiguous homology domains with 33-38% identity, suggesting a series of gene duplication events. In the gene, the boundaries of these domains are not defined by exon-intron junctions, as might have been expected. The first three homology domains and part of the fourth are all encoded by the first large exon, which is 3,180 base pairs long. The remainder of the fourth domain is encoded in the last three exons, which are 191, 143, and -740 base pairs long, respectively. This unusual structure is shared with the bovine IRBP gene. A large (1.7 kbp) fragment appears to have been lost from the 3'-noncoding region of the last human exon. We conclude that the human and bovine genes have similar evolutionary histories.
Interstitial retinol-binding protein (IRBP)' is an elongated glycoprotein found in the eyes of all vertebrates (l-8). Its size averages 134,200 Da except in bony fishes, where it is about half this value (9). IRBP is the major protein constituent of the interphotoreceptor matrix, where it is believed to play an important role in the extracellular transfer of retinoids (alltrans-retinol, 11-cis-retinol, 11-cis-retinal) between the retina and adjacent pigment epithelium during light and dark adaptation (10).
IRBP is synthesized and secreted into the interphotoreceptor matrix by the retinal photoreceptors (5,11,12). The undifferentiated neoplastic cells of retinoblastoma tumors and established cell lines derived from them also secrete IRBP * This work was supported in part by the Retina Research Foundation of Houston and National Institutes of Health Grants EY 02723, EY 07008, and EY 02489. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. (13-l@, which has also been demonstrated in mammalian pineal glands (16-18).
We have purified and characterized human IRBP (4) and localized its gene to chromosome 10 (19), where the IRBP locus has been shown to be closely linked to that of multiple endocrine neoplasia type 2a (20,21). Recently, we cloned the full-length cDNA sequence for human IRBP and demonstrated that the conceptually translated protein sequence is composed of four contiguous homology domains (302-310 residues in length) with 33-38% identity (22). Certain synthetic peptides based on this sequence have been shown to cause uveitis in rats (23). IRBP binds two molecules of alltruns-or 11-cis-retinol (3,7), and we have suggested that hydrophobic amino acids in the four homologous regions of the molecule participate in the formation of two folding domains that constitute two retinol-binding sites (23). In this work, we report the structure and nucleotide sequence for the entire human gene and part of its flanking region. This provides the basis for future detailed studies on those elements of the IRBP gene structure that regulate its expression in the normal and diseased eye and in retinoblastoma cells.

AND DISCUSSION
The strategy used to sequence the human IRBP gene is shown in Fig. 1. A comparison of the genomic sequence with our published cDNA sequence (22) reveals that the coding region of the gene is composed of four exons interrupted by three introns. The positions of the four protein homology domains identified previously (22) are indicated above the exon-intron structure. The boundaries of these domains are not defined by exon-intron junctions, as might have been expected. The first three homology domains and part of the fourth are all encoded by the first large (3180 bp) exon, with the remainder of the fourth domain being encoded in the last three exons, which are 191, 143, and -740 bp in length, respectively.
The entire DNA sequence for the human IRBP gene is shown in Fig. 2. This sequence was used to correct 31 nucleotide positions in the published cDNA sequence (Table I). The sequence starts at the putative cap site; and the positions of the presumptive translation initiation codon (22), the codon for the N-terminal glycine of the mature protein (6,31,32), and the TAG translation terminating codon are underlined. The TAG codon is located 87 bases upstream from that reported previously (22). The first of the two closely spaced AATAAA polyadenylation consensus sequences (33) is separated from the cap site by 9499 nucleotides. An SstI restriction fragment from the genomic clone HGL.l was found to overlap with clone HGL.3 and was used to extend our knowledge of the sequence by a further 700 bases in the 3'-direction (data not shown). However, no other AATAAA sequences were found. Excluding any contribution from the poly(A) tract, we predict an mRNA of -4.25 kb. Fig. 3 Table II. The donor and acceptor splice junction sequences agree closely with published consensus sequences (34), with the 5'-and 3'-ends of each intron conforming to the GT/AG rule. Table III indicates the positions of possible 3'-splice signal sequences corresponding to lariat branch sites that appear to be conserved in animal introns (35).
Total and polyadenylated RNAs prepared from batches of human retinas were used to map the 5'-end of the human  IRBP transcript using the techniques of primer extension and Sl nuclease protection. As shown in Fig. 4, at least three transcriptional start points appear to be present. Both methods place them at identical positions on the sequence of Fig.  2, where they occur at nucleotides +l (adenine), +7 (adenine), and +15 (thymidine). The A residue at nucleotide +7 is one nucleotide downstream from the putative cap site for the bovine gene (36), which corresponds to the C residue at nucleotide +6 in the human sequence. The reason for the relative differences in band intensities obtained from the two techniques was not determined. It should be noted, however, that the two experiments utilized RNA preparations from two different batches of human retinas. Multiple initiation sites are often observed in genes that lack a TATA box (37), a promoter element that we have been unable to find in the human IRBP gene2 and that also appears to be absent in the bovine gene (36). We also mapped the 5'-end of the IRBP transcript expressed in one of the retinoblastoma cell lines used in a previous study (15) and for the Northern blot in Fig. 3. The primer extension technique again shows that there are at least three transcriptional start sites at positions identical to those found for normal human retina IRBP RNA (Fig. 4, upper). The major band is observed at nucleotide +7.
Homology of Human and Bovine IRBP Genes- Fig. 5 is a dot matrix comparison of the structure and homology of the human and bovine genes. They are very similar, each consisting of four exons interspersed with three introns. The human introns have lengths of 1785, 1860, and 1606 nucleotides, respectively, compared with 2230, 1961, and 1491 nucleotides in the bovine gene. Exons 2 and 3 in the human and bovine genes are identical in length (191 and 143 nucleotides, respectively), whereas the first exons differ by only seven nucleotides, i.e. 3180 in human and 3173 in bovine. There is a large difference in exon 4, which is 1700 bases shorter in the human gene (740 for human, 2447 for bovine), which accounts for the larger size of the bovine IRBP transcript (Fig. 3) (19,38). The dot matrix plot in Fig. 5 suggests that this difference is due to loss by the human gene of a large 1700-base segment of DNA between the translation terminating codon and the 3'-AATAAA consensus sequence.
The Upper, primer extension. The two right-hand lanes contain the primer-extended cDNAs obtained by using polyadenylated RNAs from adult human retina (Ret) or retinoblastoma cell line RB522A (Rb). The primer was a 23-base oligonucleotide complementary to residues -58 to -80 of the cDNA sequence (22) (residues 47-69 of the gene sequence in Fig. 2). Lower, Sl nuclease protection. The right-hand lane shows the products of Sl nuclease digestion of human retina total RNA hybridized with a probe generated from an SstI genomic fragment and the same "P-labeled 23-base oligonucleotide used in the primer extension experiment. The sequencing ladders were obtained from the same labeled 23-base primer and the SstI genomic fragment used for the Sl nuclease protection experiment.  6 presents a sequence comparison of the human and bovine IRBPs. Excluding the unmatched regions at the N and C termini of the human and bovine proteins, respectively, but including a 2residue gap introduced into the bovine sequence, there is an 84.2% identity over a length of 1225 residues. This value is somewhat lower than the 87% identity found previously (22) on the basis of alignment with 605 bovine IRBP residues determined by amino acid sequencing of the tryptic peptides and reflects some variation in the degree of identity over the length of the sequence. Human IRBP is an N-linked glycoprotein (4), and two putative N-X-T glycosylation sites can be identified in the human sequence (filled circles). These occur in the same position in the bovine sequence (open circles), which contains a total of five possible sites, as predicted by an earlier study (40). The positions of the four homology domains assigned to the human protein (22)  The determination of the complete nucleotide sequence of the human IRBP gene has now permitted a comparison of the IRBP gene in two mammalian species and has revealed that they are remarkably similar. Their unusual structures (in particular, the very large exon that encodes all three homology domains and part of the fourth) suggest they have had a complicated history. Future comparative studies are likely to provide new insights into the mechanism of evolution of the IRBP gene as well as the function of the protein it encodes. codon at positions 3094-3096 codes for Q and not E. Finally, using human retina RNA, the authors report 5 transcription initiation sites corresponding to nucleotides +l, +7, +13, +21, and +24 in our sequence.
They assign a preferred site to nucleotide +7. We used RNA from an IRBP-expressing retinoblastoma cell line as well as adult human retinas and find at least 3 transcription initiation sites at +l, +7, and +15.