Structure of the Human Prealbumin Gene*

Using cloned human prealbumin cDNA as a probe, Southern blot hybridization of human genomic DNA revealed that the prealbumin gene consists of an unique, single-copy DNA. The nucleotide sequences of the entire human prealbumin gene, including both 581 base pairs of the 5’- and 95 base pairs of the 3’- flanking sequences, were determined. The gene spans about 7.0 kilobase pairs and consists of four exons and three introns. As in most eukaryotic genes, the consensus TATA and CAAT sequences are found 30 and 101 nucleotides, respectively, upstream from the putative cap site, and a polyadenylation signal sequence AA- TAAA is found in the 3”untranslated region. Unex- pectedly, two independent open reading frames pro-vided with respective regulatory sequences were found within the gene: one in the first intron and the other in the third intron.

On the other hand, extracellular deposition of a variant prealbumin is apparently associated with a hereditary disease, familial amyloidotic polyneuropathy (FAP1) (4, 5). Recent studies have shown that the amyloid fibril proteins derived from the Japanese (6), Portuguese (7), and Swedish (8) types of FAP all consist of the same prealbumin variant with a single amino acid substitution, i.e. valine at position 30 is substituted by methionine. To facilitate pre-and postnatal diagnoses of this intractable disease, we recently isolated and sequenced a human prealbumin cDNA (9). Although the single amino acid substitution is apparently linked to the Japanese type of FAP, the molecular and genetic basis for the accumulation of a variant prealbumin in this disease has not been established. To study the control mechanism of prealbumin gene expression and to elucidate the molecular pathogenesis of FAP, we isolated human prealbumin genomic DNAs and determined the entire nucleotide sequence of the prealbumin gene. *This work was supported in part by a grant-in-aid from the Ministry of Education, Science, and Culture of Japan. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "aduertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

RESULTS AND DISCUSSION
Southern Blot Analysis of Total Human Genomic DNA-Southern blot analysis of DNAs extracted from human placenta was performed, using two kinds of 32P-labeled prealbumin cDNA probes; one is a PstIIPuuII fragment derived from the pPAl cDNA insert (9) and covers a whole prealbumin cDNA, and the second is a PstIIXbaI fragment covering only the 5' part of the cDNA (9). When the total probe, i.e. the PstIIPuuII fragment was used, the EcoRI or HindIII digests of human DNA yielded two and the EcoRIIHindIII digests yielded three bands on the autoradiogram (Fig. la). On the other hand, in the experiment using the 5' part of the pPAl cDNA insert as a probe, all of these three differently digested human DNAs yielded only one band (Fig. Ib). Because the prealbumin cDNA contains no EcoRI and HindIII sites (9), this result suggests that there is only one prealbumin gene which contains at least two introns. This idea was given support when isolating phage clones containing a whole human prealbumin gene (see below).
Isolation and Analyses of Phage Clones Carrying a Prealbumin Gene-Two different human gene libraries, one constructed by Lawn et al. (12) and the other by Tsuzuki et al. (ll), were screened by in situ plaque hybridization, using a 32P-labeled human prealbumin cDNA as a probe. Twelve positive clones were isolated from the first library and two clones from the second library.
Digestion of the cloned DNAs with restriction enzymes and Southern blot hybridization, using the prealbumin cDNA as a probe, indicated that all these cloned DNAs were covering overlapping portions of a chromosomal segment. One of the clones, named Lm PAM-5, contained three EcoRIIHindIII restriction fragments of 2.2, 2.0, and 1.05 kilobase pairs in length (Fig. Id) and seemed to contain the entire prealbumin gene, as only these three bands were detected by Southern blot analysis of the total human genomic DNA (Fig. la, lane EIH). We found that Lm PAE-7, which was isolated from the EcoRI partial library, is carrying the 9.0-kilobase pair EcoRI fragment hybridizable to the prealbumin cDNA probe (data not shown). DNA sequencing strategy is shown in Fig. 2b, and the entire nucleotide sequence of the human prealbumin gene region is summarized in Fig. 3. These sequences were determined for the cloned DNAs present in Lm PAM-5 and Lm PAE-7 and include exons, introns, and 5'-and 3'-flanking regions (581 base pairs of the 5'-and 95 base pairs of the 3"flanking regions). As shown schematically in Fig. 2b, the human prealbumin gene consists of four exons and three introns. The sequences of four exons were identical with that of the human prealbumin cDNA (9), except for one base substitution in the 3'-untranslated region of the fourth exon, i.e. C at position 6875 is substituted by G. This comparison revealed that the codon for glycine at position 67 (9) was interrupted by the second intron. All the intron/exon junctions apparently obey the AG/GT rule and resemble the consensus acceptor ((Y)11-15NYAG/G) and donor (AG/GTRAGT) sequences described by Mount (16).
The transcription initiation site or the cap site was tentatively assigned to residue A located at position +l. This assignment is based on the finding that an A preceded by a C in most cases is the preferred cap site and on the assumption that our prealbumin cDNA clone (9) covers the full length of the human prealbumin mRNA. Furthermore, another independently isolated prealbumin cDNA clone has been reported to start from the same residue A (17). This assignment is also supported by the relative positions of the regulatory sequences, such as the TATA and CAAT boxes, as described below.
Inspection of the region upstream from the cap site revealed the presence of a possible promoter sequence, TATAAAA, between positions -30 and -24. At positions -101 to -96, there was a GTCAAT sequence, which is homologous to the consensus CAAT box. Further upstream at positions around -270 to -223, we found a characteristic structure consisting of pentanucleotide TTTTG sequence, repeated eight times.
The polyadenylation site was inferred by comparing the nucleotide sequence of the 3"untranslated region of the gene with that of the prealbumin cDNA (9). A polyadenylation signal sequence, AATAAA, was located 23 base pairs upstream from the polyadenylation site. Benoist et al. (18) identified another consensus sequence, TTTTCACTGC, near the polyadenylation site in several mRNAs, and we found a similar sequence, TTTTCACCTC, 40 nucleotides upstream from the AATAAA hexanucleotide in the human prealbumin gene.
In the second and third introns, we found 300-nucleotide sequences strikingly homologous to the human Alu-type repeat elements (19). The characteristic features of the Alutype repeats were the presence of short poly(A) tracts and a 9-nucleotide or a 15-nucleotide direct repeat (Fig. 3). Similar Alu sequences have been found in introns and exons of other genes (20,21).
Presence of Two Open Reading Frames in the Introns-One of the characteristic features of this gene is the presence of two independent open reading frames, in the first and third introns. Both of these two open reading frames are provided, in their 5'-and 3"flanking regions, with the consensus regulatory sequences for transcription. There are two putative initiation codons for the first unidentified reading frame: one is located between nucleotide positions 684 and 686 and the other between positions 688 and 690. When the translation starts from the first ATG codon, the frame codes for 60 amino acids, and when it starts from the second codon, there is an open reading frame for 37 amino acids. We found a TATAAAA sequence 106 nucleotides upstream from the first putative initiation codon and a CAAT sequence 55 nucleotides upstream from the TATA sequence. The polyadenylation signal sequence AATAAA is located 94 nucleotides downstream from the termination codon for the first open reading frame (nucleotide positions 864-866).
The second unidentified open reading frame also has two putative initiation codons: one is located at nucleotide positions 6061 and 6063 and the other at positions 6128 and 6130. When the translation starts from the first ATG codon, the frame codes for 49 amino acids, and when it starts from the second, there is an open reading frame for 69 amino acids. A possible TATA equivalent sequence, TATATAT, is located 60 nucleotides upstream from the first putative initiation codon, and the CAAT sequence is located 44 nucleotides upstream from the TATA sequence. In this case, we found two possible poly(A) addition signals: one located 45 nucleotides and the other 243 nucleotides, respectively, downstream from the termination codon for the second open reading frame (nucleotide positions 6335-6337).
The presence of two unidentified reading frames, one in the first intron and the other in the third intron, is an unexpected structure of the human prealbumin gene. "One gene's intron is another gene's exon" (22) was first found in the yeast mitochondrial cytochrome b gene (23), and this open reading frame is proved to code for a maturase, a protein which acts in the process of splicing the mitochondrial cytochrome b transcript. Further work will reveal if the open reading frames in introns 1 and 3 are expressed in vivo.
The characterized DNA segment of the human prealbumin gene described herein should facilitate not only elucidation of the control mechanism of prealbumin gene expression, but also studies on the genetic basis for accumulation of a variant prealbumin in FAP patients.

A A T C C A A G T G T C C T C T G A T G G T C M A G T T C T A G A T G C T G G C C G T 1099 GCATGTGTTCAGAAAGGCTGCTGATGACACCTGGGAGCCATTTGCCTCTG TAAGTTGCCMAGAACCC 1169
TCCCACAGGACTTGGmTATCTTCCCGTTTGCCCCTCACTTGGTAGAGAGAGGCTCACATCATCT~TA 1239 AAGAATTTACAAGTAGATTGACGTAGGCAGAGGTCAAGTATGCCCTCTGAAGGATGCCCTC~ 1309 GTTTTGCTTAGCTAGGAAGTGACCAGGAACCTGAGCATCATTTAGGGGCAGACAGTAGAGAAAAGAAGGA 1379