Human erythroid 5-aminolevulinate synthase. Gene structure and species-specific differences in alternative RNA splicing.

Erythroid 5-aminolevulinate synthase (ALAS) is expressed exclusively in differentiating erythroid cells as the principal isoform of the enzyme to catalyze the first step of the heme biosynthetic pathway. The human gene encoding this isozyme was isolated from a cosmid library, and its structure was characterized with restriction mapping followed by sequencing of fragments. The gene is 22 kilobases long and has 11 exons. Exon 2 encodes the N-terminal signal sequence required for mitochondrial import, exons 3 and 4 encode a variable portion of the N-terminal end, and exons 5-11 the highly conserved C-terminal portion of the mature protein, respectively. Enzymatic amplification of human reticulocyte RNA using PCR techniques revealed two erythroid ALAS mRNA transcripts predicted to encode both the prototypical 64-kDa isoform as well as a novel smaller isoform with a deletion of 37 amino acids near the N terminus. The two mRNA isoforms are generated by alternative splicing of exon 4 and are expressed in fetal erythroid cells as well as at all stages of erythroid development tested, so that there is no evidence of differentiation-specific regulation of exon 4 splicing. However, striking species-specific differences were observed in that alternative splicing of exon 4 was found in man but not dog or mouse; also, the previously described alternative splicing within exon 3 in mouse was not observed in man. This transcript heterogeneity suggests the existence of erythroid ALAS protein isoforms with potentially distinct functional or regulatory roles. The occurrence of species-specific splicing in the least conserved region of the enzyme may reflect another mechanism of gene evolution in eukaryotes.

as well as a novel smaller isoform with a deletion of 37 amino acids near the N terminus. The two mRNA isoforms are generated by alternative splicing of exon 4 and are expressed in fetal erythroid cells as well as at all stages of erythroid development tested, so that there is no evidence of differentiation-specific regulation of exon 4 splicing. However, striking species-specific differences were observed in that alternative splicing of exon 4 was found in man but not dog or mouse; also, the previously described alternative splicing within exon 3 in mouse was not observed in man. This transcript heterogeneity suggests the existence of erythroid ALAS protein isoforms with potentially distinct functional or regulatory roles. The occurrence of speciesspecific splicing in the least conserved region of the enzyme may reflect another mechanism of gene evolution in eukaryotes.
Biosynthesis of heme, an iron-porphryrin prosthetic group required in large amounts during erythropoiesis for assembly into hemoglobin molecules, involves the coordinated activities of a series of nuclear-encoded mitochondrial and cytoplasmic enzymes. The first and rate-limiting step in the heme biosynthetic pathway is the condensation of glycine and succinyl-CoA, and in erythroid cells is principally catalyzed by an erythroid-specific isoform of the mitochondial matrix enzyme 5-aminolevulinate synthase (ALAS)' (1-3). The gene for erythroid ALAS has recently been cloned and mapped to the X chromosome (4-6). A distinct housekeeping isoform of ALAS, encoded by a separate gene on chromosome 3 (5, 7), functions in synthesizing lesser amounts of heme in early erythroid and all non-erythroid tissues (2, 3). Understanding the regulation of these genes is important both for basic studies of erythroid differentiation and for elucidation of the pathophysiology of genetic diseases, such as certain sideroblastic anemias and porphyrias (8,9).
Molecular studies of erythroid ALAS gene expression have shown that it is subject to both transcriptional and posttranscriptional control (6,10,11). Erythroid ALAS mRNA is transcribed only in erythroid cells (1, 6). The 5"flanking sequence of this ALAS gene contains erythroid-specific promoter elements including a GATA-1 binding site (6), which may interact with the GATA-1 transcription factor (12) to restrict expression to erythroid cells. Following either treatment of mouse erythroleukemia (MEL) cells with dimethyl sulfoxide (10,ll) or mouse erythroid (J2E-1) cells with erythropoietin,* the level of ALAS mRNA rises dramatically, at least in part due to increased transcription during terminal differentiation. At the translation level, control is believed to be imparted by the iron-dependent binding of a cytosolic protein (IRE binding protein) to a stem-loop secondary structure, called an iron-responsive element (IRE), in the 5'untranslated region (5'-UTR) of the mRNA (2, 3, 6, 13). By analogy with the better characterized ferritin and transferrinreceptor mRNA IRES (reviewed in Refs. 14 and 15), it appears likely that low iron concentrations promote binding of the IRE binding protein to the ALAS IRE to specifically block its translation, whereas an increase in intracellular iron leads to dissociation of this factor and increased synthesis of ALAS protein (6, 13).
In this paper we report the genomic organization of the human erythroid ALAS gene using standard cloning techniques and molecular characterization of ALAS mRNAs from peripheral blood reticulocytes using PCR techniques. Most importantly, these studies have revealed alternative splicing events within human erythroid ALAS that differ from those described earlier in the mouse (16). Species-specific differences in the expression of ALAS mRNA isoforms indicate that evolutionary changes in the splicing of this gene have occurred since the divergence of man and mouse. Moreover, such splicing changes may represent another potential level of regulation of ALAS structure and function, in addition to The abbreviations used are: ALAS, 5-aminolevulinate synthase; IRE, iron-responsive element; UTR, untranslated region; PCR, polymerase chain reaction; kb, kilobase pair(s); bp, base pair(s).
* T. C. Cox, M. J. Bawden, and B. K. May, unpublished data. 18753 the transcriptional and translational controls described earlier. Genomic Library Screening-A human genomic library, constructed by partial MboI digestion of genomic DNA and cloning into the BamHI site of the eukaryotic expression vector, pAVCV007, was kindly supplied by Dr. K. H. Choo (17). The library was screened by the procedure of Sambrook et al. (18) using randomly primed [a-"P]-dATP-radiolabeled pHEA-6 cDNA (6).

MATERIALS AND METHODS
Restriction Mapping and Sequencing of IntronlEnon Boundarks-Exon positioning and restriction mapping of the genomic clone, PTC-EA1, was done by digestion with the restriction enzymes: BamHI, BglII, EcoRI, HindIII, PstI, Sad, and combinations thereof, and utilization of various labeled restriction fragments and oligonucleotides derived from pHEA-6 as the probes in Southern blot analysis. Intron/exon boundaries were determined following sequencing of appropriately subcloned genomic fragments (in pBluescript and pTZ18R or pTZ19R DNAs) using the Sequenase Version 2.0 sequencing kit (U. S. Biochemical Corp.).
RNA Isolation-Human reticulocyte RNA was prepared from normal peripheral blood as described previously (19). In our hands, this isolation procedure yields >10 pg of total reticulocyte RNA from as little as 10 ml of normal blood having 1-2% reticulocyte counts. Reticulocyte RNA from mouse and dog peripheral blood was isolated by the same procedure. RNA from erythroid cell lines, HepG2 cells, B and T lymphocytes, and from the cultured human brain tumor cell line 126 (20) was extracted by published protocols (21). RNA from preproerythroblasts, prepared from normal human bone marrow by an avidin-biotin immune rosetting technique (22), was a gift from Dr. Laura Coulombel (H8pital de Bicetre, Paris, France). This procedure yields a cell population containing CFU-E and their immediate progeny defined as "preproerythroblasts" (22). PCR Amplification of Erythroid ALAS mRNA and Sequencing-2 pg of total RNA was reverse-transcribed into single-stranded cDNA at 37 "C for 120 min in a 50-pl reaction containing: 40 mM KCl, 50 mM Tris-HC1 (pH 8.3), 8 mM MgC12, 0.5 mM dNTPs, 100 ng of random hexanucleotide primer, 10 units of RNasin, and 100 units of murine leukemia virus reverse transcriptase. 10 pl of cDNA was amplified in a 100-pl PCR reaction containing Taq polymerase buffer (50 mM KCl, 10 mM Tris-HC1, 1.5 mM MgClz, 0.1% gelatin) supplemented with 50 pmol each of sense-and antisense-strand oligonucleotides (see below), additional dNTPs to a final concentration of 0.2 mM, and 5 units of Taq polymerase. Thirty cycles of amplification were performed using an automated Perkin-Elmer Cetus thermal cycler under the following conditions: denaturation for 30 s at 94 "C, reannealing for 30 s at 50-55 "C, extension for 1 min, 45 s at 72 "C. DNA fragments were analyzed on 5% polyacrylamide gels. For sequencing, extracted fragments were digested with EcoRI, gel-purified and ligated into pBluescript, and sequenced in the double-stranded plasmid. Identities of all PCR fragments were confirmed by DNA sequence analysis. Oligonucleotides were as shown in Table I.

RESULTS
Genomic Structure of the Human Erythroid ALAS Gene-To isolate the human gene encoding erythroid ALAS, a cosmid library in the vector pAVCV007 was screened using the essentially full-length erythroid ALAS cDNA insert from pHEA-6 (6) as the hybridization probe. A total of 8 X 10' recombinant cosmid clones (equivalent to 10 haploid genomes) were screened, and four strongly positive hybridization signals were obtained. One clone, PTC-EA1, was chosen for detailed analysis since it contained an insert of approximately 36 kb and preliminary restriction mapping and Southern blot analysis showed it to contain the entire gene for erythroid ALAS together with substantial 5'-and 3"flanking regions.
The locations of exon sequences within the genomic clones were tentatively identified following mapping with six restriction enzymes, BamHI, BglII, EcoRI, HindIII, PstI and SacI, and hybridization of 32P-labeled fragments of the cDNA clone, pHEA-6. In regions of the cDNA where no convenient restriction endonuclease sites were present, oligonucleotides (18and 19-mers) were radiolabeled and used as probes to map exon sequences within the genomic restriction fragments. The precise locations and sizes of the exons and introns were then defined by sequencing of appropriately cloned restriction fragments and alignment with the cDNA sequence. This analysis revealed that the erythroid ALAS gene is approximately 22 kb in length and is organized into 11 exons (Fig. 1). The exons range in size from 37 to 270 bp and the introns from 561 bp to 6.0 kb with the longest intron being located in the 5'-UTR of the gene. The intronlexon boundaries are shown in Table  I1 and are mostly consistent with the consensus sequences for splice donor and acceptor sites (23). One exception is the 3' splice junction of exon 5 , which ends with ACA rather than the consensus (C/A)AG. From an examination of the gene organization and the known cDNA sequences for ALAS enzymes from various sources (6, 24) it can be deduced that exon 2 encodes the N-terminal signal sequence (designated region I), exons 3 and 4 encode a portion of the N-terminal end of the mature protein (region 2), while exons 5-11 encode the remaining highly conserved C-terminal region of the mature protein of about 440 amino acids (region 3). It is relevant that the bacterial ALAS proteins characterized comprise only region 3, which therefore must contain the catalytically active domain (6, 24). In addition, this correspondence between the assigned regions of the protein and the exon arrangement lends further support, from an evolutionary stand point, for the functional integrity of the domains.
Amplification of Erythroid ALAS mRNA from Reticulocytes-For studies aimed at investigating the structure of erythroid ALAS in patients with blood disorders, we sought to amplify ALAS mRNA from peripheral blood reticulocytes as a readily accessible source of this mRNA. For amplification by PCR, five pairs of oligonucleotides (I-V, see Table I) were   TABLE I  Sense-and  designed to bind specifically to the erythroid ALAS cDNA, but not to the housekeeping ALAS cDNA. The expected PCR fragments from the reactions would range in size from 280 bp to 472 bp and overlap the entire cDNA sequence (Fig. 2). Reactions I, 111, IV, and V yielded DNA fragments of exactly the predicted sizes (Fig. 2C). However, reaction I1 generated two fragments of 472 and 361 bp, suggesting that there may be sequence heterogeneity among the ALAS mRNA population. A similar observation was made in earlier studies of protein 4.1 mRNA in reticulocytes, where splicing-mediated insertion or deletion of exon(s) within a given PCR fragment was manifested by production of multiple PCR fragments of different sizes (25, 26). To examine whether a similar phenomenon might be occurring in the ALAS gene, we compared the sequence of the two PCR products of 472 and 361 bp with the known genomic structure of the ALAS gene (see Fig. 1). The smaller DNA band precisely lacked the 111 bp corresponding to exon 4 (Fig. 3), while the larger product retained the exon 4 sequence. The deduced ALAS protein isoforms would be identical except for the presence or absence of 37 amino acids extending from Asp''* to These results establish that two ALAS mRNA species in circulating reticulocytes arise by an alternative splicing event and indicate that there is structural heterogeneity in the N-terminal region (region 2) of the ALAS protein expressed in human erythroid cells.
Additional PCR experiments were performed to explore the expression of the alternatively spliced ALAS mRNAs during erythroid cell differentiation. Fig. 4 shows that cells from different stages of erythroid cell differentiation, including preproerythroblasts and reticulocytes, as well as the erythroid cell lines K562, HEL, and KU812, expressed both isoforms of ALAS mRNA in relatively constant proportions. In subsequent studies, these DNA fragments were also observed in the amplified erythroid ALAS mRNA from human first trimester fetal liver and adult bone marrow. In addition, the two erythroid ALAS mRNA isoforms have been detected by Northern blot analysis of total RNA from adult bone marrow and reticulocytes (data not shown). Thus, in contrast to the differentiation-specific program of protein 4.1 mRNA splicing described earlier (26,27), no evidence for regulation of ALAS alternative splicing was obtained. However, strict transcriptional regulation of the erythroid ALAS gene was confirmed as none of the non-erythroid cells tested (B and T lymphocytes, HepG2, and brain 126) showed any evidence of erythroid ALAS expression (Fig. 4) even by this very sensitive PCR assay. In addition, no alternative splicing across region 2 was observed for the human housekeeping ALAS mRNA when amplified from non-erythroid cells or the hematopoietic tissues, human fetal liver, and adult bone marrow, in which expression was detected (data not shown).

Comparison of Erythroid ALAS mRNA Splicing across Spe-
cies-Interestingly, alternative splicing event(s) affecting the N-terminal structure of ALAS are not conserved across species. Previous studies of mouse erythroid ALAS mRNA have identified two isoforms which differ by insertion or deletion of 45 nucleotides in exon 3, produced apparently by utilization of alternate acceptor sites (16). PCR analysis of human erythroid ALAS mRNA did not reveal this splice isoform (data not shown), and comparison of mouse and human nucleotide sequences provides a good molecular explanation for this difference (Fig. 5A). Mouse exon 3 has two potential splice acceptor sites, a major upstream site (-85% of mRNAs) and a minor downstream site (-15%) (16). Both mouse acceptor sites terminate with the consensus dinucleotide AG normally found at the 3' end of an intron. In the human sequence, single base change A+G disturbs the consensus sequence at the downstream site, eliminating the possibility of alternative splicing within exon 3. In addition, a second critical nucleotide substitution may strengthen the upstream site by providing a better match to the consensus N(T/C) AG sequence (Fig. 5A).
To examine whether alternative splicing of exon 4 is phylogenetically conserved, oligonucleotide primer set VI spanning exons 3-5 was used to amplify reticulocyte mRNA from human, mouse, and dog sources. Human reticulocyte ALAS RNA exhibited alternative splicing of exon 4 (Fig. 5C); the same results were obtained using samples obtained from four other unrelated individuals (data not shown). In contrast, the (GenBank M15268) sequences at a potential alternative splice acceptor site in exon 3. Mouse ALAS exon 3 contains an internal AG dinucleotide (thin arrow) that can serve as a splice acceptor site, leading to deletion of 45 nucleotides/l5 amino acids from the corresponding protein (16). In human ALAS mRNA, the A-G substitution (asterisk) would inactivate this potential splice site, leaving only the upstream AG (thick arrow) as a splice acceptor. Nucleotide differences are boxed. B, comparison of human and mouse intron/exon boundaries surrounding exons 3,4, and 5 of the erythroid ALAS gene. In the production of the human erythroid ALAS mRNA, the sequence encoded by exon 4 is alternatively spliced, giving rise to two mRNAs. Nucleotide differences are boxed. No ready explanation (from this sequence comparison) for the difference in alternative splicing can be given. C, total reticulocyte RNA from human, mouse, and dog sources was transcribed into cDNA and amplified with primer set VI, specific for exons 3-5 of erythroid ALAS. Two species of ALAS mRNA were detected in the human sample, one including and one skipping exon 4. Only one isoform, including exon 4, was present in mouse and dog. splicing apparatus in murine and canine erythroid cells apparently treats exon 4 as a constitutive exon, as only the upper PCR product was obtained (Fig. 5C). Again, these results were confirmed by analysis of two mouse erythroid RNA sources (reticulocytes and cultured mouse erythroleukemia cells), and two canine sources (reticulocytes from two different dogs). The molecular explanation for the differential splicing of exon 4 in human is not clear, and a comparison of the intron boundary sequence and splice junctions with the murine sequence (Fig. 5 B ) does not offer a ready explanation.

DISCUSSION
Characterization of the gene for human erythroid ALAS revealed that it spans approximately 22 kb and contains 11 exons, with a notable feature being the presence of a 6.0-kb intron in the 5'-UTR. This structural organization is remarkably similar to that reported for the mouse erythroid ALAS gene (16), although there is a marked difference in the sizes of introns 6 and 7. The organization of the human erythroid ALAS gene also resembles that of the housekeeping ALAS gene from chicken (28). While the chicken housekeeping ALAS gene lacks an intron in the 5'-UTR, we have recently shown that the housekeeping genes for rat and human ALAS both contain a single intron in this region? It seems probable that the erythroid and housekeeping ALAS genes may have evolved by addition of DNA sequences encoding the different regions 1 and 2 to a primitive catalytic protein encompassing region 3 with subsequent duplication and divergence of this gene.
Although the basic structure of these ALAS genes is similar, studies of mature erythroid ALAS mRNAs demonstrate variations in splicing among the mammalian species examined. Alternative splicing events within exon 3 involving two potential splice acceptor sites were observed in mouse (16) but not in humans, whereas alternative splicing of exon 4 was seen in man but not mouse or dog (this paper). This result is in contrast to our previous studies of alternative splicing in the erythroid protein 4.1 gene, in which an alternative splice that affects a functionally important domain is conserved in humans, mouse, and dog (26,27,29). The lack of phylogenetic conservation in the erythroid ALAS gene implies that we are detecting events in the evolution of ALAS structure and function that postdate the divergence of man and mouse. To our knowledge, few such examples of species-specific splicing events within a single gene have been reported. One such case involves a major structural protein of lens, aA-crystallin, in which an extra exon was expressed in rodents and certain other mammals but not in primates, birds, or other vertebrates (30). Differences in splicing were also observed in the adenosine phosphoribosyltransferase gene of two distantly related Drosophila species (31). We speculate that changes in splicing patterns within a gene may represent another mode of eukaryotic gene evolution, in addition to well-documented cases of gene duplication and divergence.
The functional implications of these splicing events for ALAS activity are, as yet, unclear. From a comparison of all reported sequences for ALAS isoforms (6), it was noted that region 2 of ALAS enzymes exhibited the least conservation both between species and between isozymes within a single species. Our demonstration of species-specific splicing within this region of the erythroid ALAS protein and thus the prediction of alternatively expressed peptides (Fig. 6) again raises the question of the region's functionality in eukaryotic ALAS expression. This domain is probably not directly involved in catalysis since it is absent in bacterial ALAS proteins (6, 24). If this region is functionally dispensable, the species differences in splicing may represent neutral intermediates in ALAS gene evolution. On the other hand, it is possible that this region may serve to modulate the function of domains important in catalysis or mitochondrial import, as the presence or absence of the 37 amino acids encoded by exon 4 could alter ALAS tertiary structure. Precedence for such a phenomenon exists in the case of erythroid ankyrin isoforms 2.1 and 2.2, which differ by splicing insertion or deletion of an internal peptide (32, 33). These two isoforms exhibit different affinities for spectrin and band 3, even though the binding sites appear to localize outside the alternatively spliced region (34).
Despite the fact that circulating reticulocytes have lost their nucleus and much of their cytoplasmic organelles, we have demonstrated that the mRNA for erythroid ALAS is still retained in sufficient quantity to allow cloning and sequencing. This ability to amplify erythroid ALAS-specific sequences from peripheral blood reticulocytes will facilitate future studies of ALAS structure and function in erythroid disorders.