The Mitochondrial Uncoupling Protein Gene CORRELATION OF EXON STRUCTURE TO TRANSMEMBRANE DOMAINS*

The mitochondrial uncoupling protein, a protein es-sential for the thermogenic properties of brown fat in mammals, is inserted in the inner mitochondrial membrane by means of six a-helical hydrophobic transmembrane domains. We have sequenced a complete cDNA and parts of the gene to determine that the mitochondrial uncoupling protein gene is composed of six exons, each of which encodes a transmembrane domain. We also show that transcription of the uncoupling protein gene is from a single start site; however, the use of alternative poly(A) addition signal sequences results in two mRNAs, the major species of 1221 nucleotides, not including the poly(A) tail, and a minor species of about 1600 nucleotides. The 5‘-untranslated region of the mRNA is composed of 231 nucleotides, and the 3”untranslated region contains 81 nucleotides prior to addition of the poly(A) tail. Three proteins of the inner membrane, the ADP/ATP translocator, the uncoupling protein (Ucp)’ of brown fat, and the phosphate carrier protein, are characterized by the a-helical hydrophobic transmembrane domains by which they are the mitochondrial Similarities in amino acid sequence among the proteins they evolved three the repetitive It a tripartite consisting

Three ion carrier proteins of the inner mitochondrial membrane, the ADP/ATP translocator, the uncoupling protein (Ucp)' of brown fat, and the phosphate carrier protein, are characterized by the presence of six a-helical hydrophobic transmembrane domains by which they are inserted into the mitochondrial membrane (1)(2)(3)(4). Similarities in amino acid sequence among the proteins suggest that they evolved from a common ancestral gene (5). All three proteins have the same basic repetitive structure. It is a tripartite structure consisting of three homologous repeating 100-amino acid segments, with each segment containing two transmembrane domains (2). We now report that the structure of the nuclear gene for the mitochondrial Ucp reflects the domain structure. Each of the six exons in the gene encodes one of the transmembrane domains. We also show that transcription of the uncoupling protein gene is from a single start site and that the use of alternative poly(A) addition signal sequences results in two mRNAs, the major species of 1221 nucleotides, not including the poly(A) tail, and a minor species of about 1600 nucleotides.

MATERIALS AND METHODS3
* This work was supported by National Institutes of Health Grant HDOS431 (to L. P. K.). The Jackson Laboratory is fully accredited by the American Association for Accreditation of Laboratory Animal Care. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "aduertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
$ To whom correspondence should be sent. The abbreviations used are: Ucp, mitochondrial uncoupling protein; kb, kilobase pairs; bp, base pairs.
Portions of this paper (including "Materials and Methods," Footnote 2, and Fig. 1) are presented in miniprint at the end of this paper. Miniprint is easily read with the aid of a standard magnifying glass. Full size photocopies are included in the microfilm edition of the Journal that is available from Waverly Press.

RESULTS AND DISCUSSION
We previously isolated a cDNA clone for the mitochondrial uncoupling protein (UCP) from mouse brown fat (6). By Northern blot analysis, this cDNA hybridized to two mRNAs, a major species of approximately 1300 nucleotides and a minor species of 1700 nucleotides. Both mRNAs were highly induced when mice were exposed to the cold at 5 "C, suggesting they were derived from the same transcriptional unit. In order to determine the molecular mechanisms controlling Ucp gene expression in brown fat, we have been investigating the structure of both the Ucp mRNA and gene. Additional screening of our brown fat cDNA library has provided a cDNA sequence which is complementary to the full length of the major Ucp mRNA species (Fig. lA). The Ucp gene was isolated from a Charon 28 BALB/c embryo DNA library as two overlapping clones. We have sequenced the cDNA clones (Fig. lA). We have also used synthetic primers to sequence the Ucp mRNA directly to define the 5' end of the mRNA, and we have sequenced parts of the 5' and 3' flanking region and the exons of the Ucp gene (Fig. 1B).
The start site for transcription was determined by comparing the RNA sequence to the genomic sequence as illustrated in Fig. 2A. Reverse transcription of the RNA terminates with a G. Sequencing of an M13 clone of the 5' flanking region of the gene establishes this G as the start site for transcription. The site is located 231 bp upstream of the ATG translation initiation codon and 26 bp downstream of the sequence TA-TATA. This latter sequence has the properties for the TATA box promoter region with respect to both sequence and position (14). The sequence GCCGGG at the 5' end of the mRNA is also located in the cDNA clone, p-Ucp 2.
In addition, the 5' end of the cDNA, p-Ucp 2, carries 179 nucleotides which do not match the Ucp mRNA sequence or any sequence in the genomic clones (Fig. lA). Since this sequence hybridizes to an RNA in brown fat of approximately 4400 nucleotides and maps to a site on chromosome 7 near the apolipoprotein E gene (data not shown), whereas Ucp maps to chromosome 8, we conclude that the 179 nucleotides of the 5' end of the cDNA were attached to the Ucp cDNA during construction of the cDNA library.
Sequencing of the cDNA defined the 3' end of the major mRNA species ( major species, the other is 399 bases downstream of the first AAUAAA signal sequence. To evaluate the possibility that the latter signal sequence was used to produce the larger mRNA, a Northern blot containing brown fat RNA was hybridized ( Fig. 2B) with an upstream probe containing sequences derived from the region defining the 3' end of the cDNA (probe A) and a downstream probe which extends just beyond the third signal sequence (probe B). The results which show that the upstream probe A hybridized to both RNAs while the downstream probe B hybridized to only the minor, high molecular weight Ucp RNA, strongly indicate that the high molecular weight RNA is derived by utilization of the downstream AAUAAA site. In summary, the Ucp gene appears to have one major start site for transcription and the sixth exon can have a short or long form depending on which poly(A) addition signal sequence is utilized. It is also possible that the second poly(A) addition signal sequence is also used to yield a mRNA which could not be resolved from the major mRNA form using the Northern blot analysis.
The protein sequence for mouse UCP has been obtained by reverse translation of the cDNA sequence. This sequence shows strong homology to the hamster and rat proteins (2,3,15). All three species have 306 amino acids. The mouse protein differs from the rat in 8 positions and from the hamster in 24 positions with 20 of the differences with the latter being conservative. It has previously been noted that rat and hamster differ from each other at 26 positions (15). At the nucleotide level, the coding region of the mouse and rat cDNAs share 93% sequence homology. The 5"untranslated region shows less homology, but it cannot be fully evaluated until the complete sequence for the rat is known. The 5"untranslated region for the mouse mRNA consists of 231 nucleotides, while only 177 nucleotides were reported for the largest rat cDNA clone (3).
The sequencing strategy for the exons in the Vcp gene is described in Fig. 1B. The sequence of these exons and adjoin-ing intron regions shows that the Ucp gene is composed of six exons (Fig. 1, Miniprint section). Also shown in Fig. 1 (Miniprint section) are differences in sequence found between the cDNA sequence of p-Ucp 2 and the genomic sequence. These differences are located at position 145 in the 5' untranslated region and at codons 247 and 299, Since these substitutions do not change the amino acid sequence, we think the differences are genuine and arise from differences in the genotype of the cDNA (C57BL/6J) and the genomic DNA (BALB/c).
The location of the introns relative to the domain structure of the uncoupling protein is clearly evident (Fig. 3). Introns 11 and IV interrupt the coding sequence within codons 108 and 209 to divide the coding region into the three 100-amino acid repetitive segments observed by sequence similarities for the inner membrane carrier proteins (2,4). In addition, introns I, 111, and V further subdivide the coding region so that introns I and I11 interrupt transmembrane domains A and B and C and D, respectively, while intron V is located in codon 269 slightly within the region which encodes transmembrane domain F. Intron I interrupts the gene in a region, codon 42, which encodes a p strand possibly associated with a membrane pore (2). Remarkably, each of the transmembrane domains of the UCP is encoded by a separate exon.
A relation between introns and exons encoding transmembrane domains was first observed with bovine rhodopsin which has seven transmembrane domains (16). Introns in the rhodopsin gene interrupt the coding regions at three positions which mark the boundaries of the hydrophobic regions. A more striking correlation was recently described for the 3hydroxy-3-methylglutaryl coenzyme A reductase gene where each of the seven transmembrane domains, which enable the protein to be inserted in the endoplasmic reticulum, is separated by an intron (17). The single transmembrane domain of the H-2 molecule is also defined as a separate exon (18). A striking contrast to a gene organization where each transmembrane domain is encoded by a separate exon is found in the  (8). The RNA sequence of the Ucp RNA was obtained by a procedure described by Geliebter et al. (10) with a synthetic primer which was derived from a sequence overlapping the PstI site in exon I (Fig. 1B) entire protein is encoded by a single exon. The only G-proteincoupled receptor which has an exon/intron structure is rhodopsin and this is the first gene found to have a relationship between the transmembrane domains and intron/exon structure (16). It is unlikely that the presence or absence of introns in the coding region for transmembrane domains has any functional importance; however, insights into the evolutionary relationships among proteins with transmembrane domains may become evident when the structures of other genes encoding proteins with such domains are determined.

E G C M~G G A G G A A G A G * T A C T G A A C~T C l T T U j G C l T C G
Lyr Ser Arg Gln Thr Val Asp Cyr Thr Thr G 300 105