Abundant Adrenal-specific Transcription of the Human P450c21A “Pseudogene”*

Human adrenal steroid 21-hydroxylase (P450c21) is encoded by the CYP2lAl (21B) gene located in the class I11 region of the HLA locus. A tandemly duplicated gene designated CYP2lAlP (21A), which lies 30 kilobases upstream, contains several point mutations and an €!-base pair deletion so that it cannot encode P450c21 protein; as a result, it is generally considered to be a pseudogene. We previously showed that two additional genes, XA and XB, lie on the opposite strand of DNA overlapping the 3‘-ends of the 21A and 21B genes. We have now identified a third pair of duplica- ted overlapping genes in this locus, termed YA and YB, whose transcriptional orientation is the same as 21A and 21B and opposite to XA and XB. YA transcripts use the 21A promoter, have 5’-ends that are similar to 21B mRNA, and have -10-20% of the abundance of 21B transcripts, but have unique 3“ends. The YA gene encodes a 7.5-kilobase RNA that overlaps XA com- pletely and a 3.0-kilobase RNA that excludes most of XA. The YB gene appears to be similar in size and organization to YA. The YA and YB genes extend beyond the limit of the duplication in this locus; hence, their cDNAs are distinguishable by differences in their 3’-sequences. YA and YB transcripts are expressed only in the fetal and adult adrenal

Human adrenal steroid 21-hydroxylase (P450c21) is encoded by the CYP2lAl (21B) gene located in the class I11 region of the HLA locus. A tandemly duplicated gene designated CYP2lAlP (21A), which lies 30 kilobases upstream, contains several point mutations and an €!-base pair deletion so that it cannot encode P450c21 protein; as a result, it is generally considered to be a pseudogene. We previously showed that two additional genes, XA and XB, lie on the opposite strand of DNA overlapping the 3'-ends of the 21A and 21B genes. We have now identified a third pair of duplicated overlapping genes in this locus, termed YA and YB, whose transcriptional orientation is the same as 21A and 21B and opposite to XA and XB. YA transcripts use the 21A promoter, have 5'-ends that are similar to 21B mRNA, and have -10-20% of the abundance of 21B transcripts, but have unique 3"ends. The YA gene encodes a 7.5-kilobase RNA that overlaps XA completely and a 3.0-kilobase RNA that excludes most of XA. The YB gene appears to be similar in size and organization to YA. The YA and YB genes extend beyond the limit of the duplication in this locus; hence, their cDNAs are distinguishable by differences in their 3'-sequences. YA and YB transcripts are expressed only in the fetal and adult adrenal glands, but their cDNAs do not contain a long open reading frame. Although the function of these genes is not yet clear, the complex genetic organization of three overlapping genes (21/X/Y) appears to be unique among higher eukaryotes. As YA transcription is initiated by the 2 1A 5"flanking DNA and includes 21A sequences, the designation of 21A as a "pseudogene" merits reconsideration.  is an adrenal-specific microsomal cytochrome P450 required for the synthesis of both glucocorticoids and mineralocorticoids (1). Genetic lesions in the human P450c21 gene locus cause congenital virilizing adrenal hyperplasia (2, 3), a common disorder affecting -1 in 12,000 persons (4); thus, this gene locus has been the subject of intensive study. The human genome contains two genes formally termed CYP21AlP and CYP2lAl (5) and generally referred to as 21A and 21B. These genes are located in the class 111 region of the HLA locus on and F32-DK08325 (to S. E. G.), The costs of publication ofthis article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. chromosome 6p21.1 and are duplicated in tandem with the C4A and C4B genes, which encode the two forms of the fourth component of serum complement. These four genes are arranged 5'-C4A/21A/C4B/21B-3' and have the same transcriptional orientation (Fig. 1, upper panel). The human 21B gene encodes the P450c21 protein. The 21A gene has 98% nucleotide sequence identity to 21B ( 6 4 , reflecting concerted evolution (9)(10)(11) at this locus, but 21A has various mutations, producing frameshifts and three premature stop codons (6-8); hence, it cannot encode a P450 protein. Gene conversion events in the duplicated C4 (12-15) and 21 genes are common, resulting in -85% of the mutations causing congenital virilizing adrenal hyperplasia (2,3,(16)(17)(18)(19)(20)(21) and suggesting an extremely high degree of recombinational activity in this locus.
We recently identified a pair of genes termed XA and XB encoded on the DNA strand opposite from the 21A and 21B genes and overlapping the last exon of each ( Fig. 1) (22)(23)(24). The XB gene encodes a protein with remarkable homology to the extracellular matrix protein tenascin (23)(24)(25). The XA gene gives rise to a stable adrenal-specific transcript of unknown function; it is >99% identical to XB, but is truncated at its 5' -end and contains a deletion closing the reading frame (23).
We now report the adrenal-specific expression of another pair of duplicated genes, operationally termed genes YA and YB, with the same transcriptional orientation as 21A and 21B and overlapping the XA and XB genes, respectively. The YA gene utilizes the 21A promoter and extends to a point just upstream from the C4B gene. The YA gene encodes 7.5-and 3.0-kb' transcripts that differ by the inclusion of a 4.5-kb intron, which spans all but 80 bases of the XA gene. The YB gene appears to be similar to the YA gene, although its 3'end is distinct, and its exonic regions have extensive overlap with exons of the XB gene. The complex organization of three overlapping genes within this locus appears to be unique in the human genome. The function of the X and Y genes is not clear. The transcription of the YA locus may faciftate singlestranded DNA breaks that could be involved in gene conversion events. Furthermore, RNA/RNA hybrids may form between 21B and X mRNAs and between X and Y RNAs; such hybrids might influence splicing, transport, or translation of the protein-coding mRNAs within this locus.

12919
____________ _------Duplication Limits -----______ """"." " " " " _ I " " " _ extracted twice with phenol/CHCI3; precipitated with ethanol; and quantitated by adsorption spectrometry at A2W/AZW. Polyadenylated RNA was selected by oligo(dT) chromatography and subsequently size-selected. Reverse transcription of the largest fractions was performed with Superscript reverse transcriptase (Bethesda Research Laboratories), and second strand synthesis was performed according to the manufacturer's recommendation. After addition of EcoRI adapters, cDNAs were ligated to XZap arms (Stratagene, La Jolla, CA), packaged, and transformed into Escherichia coli XL-1 Blue (Stratagene). The unamplified library was plated at a density of 5 X lO'/plate, and duplicate lifts were made on nitrocellulose filters. Filters were probed in 50% formamide, 5 X SSC, 1 X Denhardt's solution, 0.1% SDS at 42 "C. Probes were labeled to a specific activity of >5 X 10' cpm/pg using random primers (LKB Biotechnology Inc.). Washing was carried out in 0.1% SSC, 0.1% SDS for 15 min at room temperature and twice for 15 min at 65 "C. Positive plaques were picked and purified by two additional rounds of plating and probing. Bacteriophage DNA was prepared by the method of Helms et al. (27). Bluescript plasmids were rescued from XZap phage using R408 helper phage according to the manufacturer's protocol.
DNA Sequencing-Genomic and cDNA fragments for sequencing were subcloned into pBluescript vectors (Stratagene), and DNA was prepared by alkaline lysis. Sequencing was carried out on doublestranded templates (28) with Sequenase (United States Biochemical Corp.) and "S-dATP as recommended by the manufacturer, except that termination reactions were carried out at 42 "C. Reaction products were analyzed by electrophoresis on 5% polyacrylamide gels containing 7 M urea and subsequent autoradiography. Sequence analysis was carried out using the DNA Inspector (TEXTCO, West Lebanon, NH) and the University of California-San Francisco Department of Biochemistry's "Eugene" sequence analysis software.
Northern Blotting-Total RNA was prepared from human fetal tissues quickly frozen in liquid Nz as described above. Glyoxalated RNA samples were run in phosphate electrophoresis buffer at 75 V for 4 h and transferred for 24 h in 20 X SSC to nylon membranes (Amersham Corp.). After UV cross-linking, blots were hybridized and washed as described above for nitrocellulose filters, except that the concentration of SDS was raised to 1% for both hybridization and washing.
RNase Protection-RNase protection experiments were done essentially as described (26). Templates were cloned into pBluescript and linearized, and antisense probes were synthesized with [32P]UTP using bacteriophage T7 or T3 RNA polymerase. After DNase I treatment, phenol/CHC13 extraction, and ethanol precipitation, 5 X

Cloning of YA cDNAs-Northern blots of fetal adrenal
RNA probed under high stringency with the XA region between 21A and C4B revealed transcripts larger than those of the XA RNA (23). Similar transcripts are also seen on long exposures of Northern blots of human fetal adrenal RNA probed with 21B cDNA (Fig. 2, lune 2) (29). To identify these larger transcripts, we used genomic probe G1, a 483-bp Pstl-XhoI genomic fragment lying at the 5'-end of the XA gene ( Fig. 1, upper panel), to screen 350,000 clones of our unamplified cDNA library constructed from human fetal adrenal RNA. From 39 positive clones, we purified three longer than 2.5 kb (the size of XA transcripts). Clone YA-3 was 2.6 kb long and had a 5"sequence identical to that of 21A, beginning at base -80 (numbered according to Ref. 6). This cDNA was identified as a 21A transcript because its first exon contained the seven established point differences with 21B (6-8). The genomic region encoding the 3'-end of YA-3 was found 8 kb downstream between the cap sites of the XA and C4B genes. The sequence of the remainder of this cDNA clone showed the intron/exon organization diagrammed in Fig. l.4. YA-3 was spliced in a fashion analogous to 21B, except that the first two introns were included, and a splice junction was present 24 bp downstream from the TGA codon in exon 10 of 21A (bases 2704-2706 in Ref. 6). The ninth exon of YA-3 overlaps the cap site of the XA gene (23). The final two exons of YA-3 lie between XA and C4B. Each splice junction conformed t o the GT/AG rule. Clone YA-3 was partial-length because it did not contain a poly(A) tail, although there is a canonical polyadenylation signal in the genomic sequence 300 bp downstream from the end of this clone (bases -682 to -677 in Ref. 23 and Fig. 2). T o delineate the 3'-end of the YA cDNA, the 39 original positive clones were rescreened with a 1-kb PCR-amplified genomic probe (see G2 in Fig. 1). This probe extended from the limit of the duplication to the polyadenylation signal and identified four additional clones. One of these, clone YA-17, was polyadenylated, containing a poly(A) tail 12 bases downstream from the predicted AATAAA polyadenylation signal. As shown in Fig. lA, the last two introns were excised from YA-17 as in YA-3; however, the 5'-most exon of this clone continued 240 bp upstream without splicing. The intronlexon organization of the other three YA clones was the same as that of YA-17, extending the sequence of this exon 1.2 kb upstream from the alternate splice junction present in YA-3 (Fig. L4). The sequences of these cDNAs correspond to the sequences of the corresponding genomic DNA (6)(7)(8)23). YB Gene-Because the duplicated units of the C4/21/X/Y locus are >98% identical, we suspected that a YB gene might also exist. We reasoned that its 3'-end would lie beyond the limit of the duplication and hence would be distinct. Therefore, we sequenced a 3.3-kb cDNA clone (YB-28) that was identified on the original screen but that did not hybridize to the YA-specific 3'-probe from YA-17. By comparison to the genomic sequence of this region, clone YB-28 encoded portions of two exons (Fig. 1B). The upstream exon was 3.2 kb and extended across the duplication boundary in a fashion analogous to the long YA clones; the remaining 100 bp of Y-28 were separated from the 5'-exon by an intron of -1.2 kb. This clone was partial-length, containing neither the 5'-nor the 3'-end. To identify the 3'-end of YB transcripts, we screened an oligo(dT)-primed Okayama-Berg fetal adrenal library (a gift of Dr. David Russell) with a 1.4-kb BglII genomic fragment lying beyond the 3'-end of YB-28, identifying a single clone (OB-9). Its sequence contained all of the last two exons of YB, a portion of the long exon identified in YB-28, and a canonical polyadenylation signal followed 15 bp later by a poly(A) tail (Fig. 1B). As with the YA cDNAs, these clones lacked a long open reading frame; however, these YB cDNAs have substantial overlap with XB RNAs, overlapping parts of the six XB exons in this region (Fig. 1B).
Confirmation of the Structures of Y Transcripts in the Human Fetal Adrenal-To confirm the presence of YA transcripts in the adrenal, we reprobed the Northern blot of fetal adrenal RNA with a 600-bp fragment of clone YA-17 extending from the limit of the duplication to the polyadenylation signal (Fig. lA). Because this probe lies 3' to the duplication limit and upstream from the XA cap site, it is specific for YA transcripts. This probe identified a band of 1.5 kb and a somewhat broader band of 3.0 kb, confirming the presence of both long and short YA RNA species, but this probe did not hybridize to the 2.0-kb P450c21B mRNA (Fig. 2). The difference between the 7.5-and 3.0-kb YA species is equal to the size of the long intron identified from clone YA-3, suggesting that these forms of YA arise by alternate splicing. The 1.5and 12-kb RNAs, also seen on this blot, may represent other splicing variants, but their identities are unknown.
To determine if there was a short YB RNA analogous to short YA and to determine the relative abundances of the various Y transcripts, we performed RNase protection assays. We used a 611-bp Sac1 fragment of clone YA-3 encompassing 199 bp of the last exon of 21A (having sequence identical to that of 21B), 333 bp of sequence common to YA and YB, and 79 bp specific to YA (Fig. 3). This probe can distinguish both the long and short forms of both YA and YB. The long forms of YA and YB are -10-20% of the abundance of P450c21B, while the short forms are very much less abundant. The reason for the apparent difference in abundance of the long and short forms of YA seen in Figs. 2 and 3 is unclear; there may be differences among various adrenal glands, and the long form is more subject to degradation or may not have transferred completely, leading to disproportionately lower representation on the Northern blot (Fig. 2).
Clone YA-3 included the first two introns of 21A. To determine if the inclusion of these introns is a general feature of YA RNAs, we performed RNase protection assays with a 396-bp EcoRI fragment of YA-3 that includes exons 3 and 4 and 205 bases of intron 2. This probe spans the 8-bp deletion in exon 3 of 21A so that spliced and unspliced forms of P450c21B and YA RNAs can be differentiated. Fig. 4A shows an abundant 156-bp protected fragment corresponding to properly spliced P450c21B mRNA, a 401-bp protected fragment corresponding to a YA species retaining intron 2, and a minor 191-bp species protected by YA without intron 2 or by P450c21B and not cleaved at the 8-bp deletion in exon 3. To verify the inclusion of intron 1, we used an RT/PCR assay using a sense primer in exon 1 and an antisense primer in exon 3 that spanned the 8-bp deletion in the 21AIYA gene. Under stringent conditions, these primers should allow efficient amplification of transcripts from 21A but not from 21B and should amplify any shorter spliced cDNAs preferentially. These primers amplified a single product of 550 bp from the cloned 21A gene (Fig. 4B). Reverse-transcribed adrenal

FIG. 3. RNase protection assay of Y transcripts.
A 694-nucleotide cRNA probe was gel-purified and hybridized to 20 p g of total fetal adrenal RNA or tRNA. Only fetal adrenal RNA protected probe fragments. As diagrammed to the right, short YA mRNAs protected all 611 bases of the probe corresponding to the Sac1 insert. Short YB transcripts protected a probe fragment truncated at the limit of the duplication. Because the probe was derived from a short YA clone, long Y RNAs protected probe fragments truncated a t the splice junctions indicated by inuerted triangles (412 bp for long YA and 333 bp for long YB). The intense 199-nucleotide protected fragment arises from P450c21B transcripts plus the 5'-portions of the probe hybridizing to long YA and YB RNAs. Markers indicated to the left are HpaII-cut pBluescript. poly(A+) RNA (20 ng) gave products of 550 and 450 bp, but only the larger band (corresponding to RNA retaining both introns) hybridized to a 21A genomic probe (Fig. 4C). Thus, clone YA-3 represents the majority of YA RNA, but the splicing of introns 1 and 2 of 21A is somewhat variable.
Adrenal-specific Expression of Y Genes-Because the YA and YB RNAs appear to arise from the 21A and 21B promoters, we expected that they, like P450c21 mRNA (26), would be expressed only in the adrenal cortex. To test this, we performed RNase protection assays with a 357-bp FokI fragment encompassing the 3'-end of YA-3 and crossing the duplication boundary. This probe distinguishes YA RNA (357-bp protected band) from YB RNA (97-bp protected band). Both fetal and adult adrenal RNAs protected readily detectable bands of 97 and 357 bp, but RNAs prepared from other tissues did not (Fig. 5). Longer exposures detected very low expression of YA in brain and liver (data not shown). The origins of the ubiquitous minor band a t 210 bp and of the abundant adult-specific 220-bp band are unknown.
A.  ( N o N A ) . Markers are XIHindIII and @X174/HaeIII. C, autoradiograph of gel shown in H probed with a 1-kb fragment from the 5'-end of YA-3. Exposure was for 2 h without an intensifying screen.

DISCUSSION
We used cDNA cloning, sequence analysis, Northern blotting, RT/PCR amplification, and RNase protection assays to identify the previously undescribed YA and YB transcripts arising from the C4/21/X locus. The YA and YB RNAs are not artifacts due to transcriptional "readthrough," and they do not represent genomic contamination of the RNA we used these RNAs are spliced in a consistent fashion, are polyadenylated, and are detectable in all human adrenal RNA samples examined. Y gene transcripts had not been recognized on Northern blots previously because of their larger size and lower abundance compared with P450c21B mRNA. A transcript arising from the 21A gene might be expected because the 21A promoter is well preserved. The first 1680 bp of 5'- flanking sequences of 21A and 21B are 98.6% identical, including nearly complete identity in regions that appear to be important for 21B gene transcription (31). Despite this great similarity in the 21A and 21B promoters, there are substantial differences in Y and P450c21 RNA abundances, suggesting that post-transcriptional mechanisms may influence the relative abundances of YA, YB, and P450c21 RNAs. This very high degree of nucleotide sequence identity between the duplicated A and B loci appears to represent an example of concerted evolution driven by frequent genetic exchange (9)(10)(11). However, if the YA gene product has a function, there might be additional selective pressure favoring such concerted evolution.
The similarity between the 21A and 21B promoter sequences made it difficult to map the precise cap sites of the YA and YB RNAs as the 5'-ends of YA, YB, and P450c21B RNAs are identical. Both the human (6, 7) and bovine (32) 21B genes have three cap sites; the site 10-11 bp from the ATG translational start codon initiates most P450c21B mRNA transcription, but we cannot tell if all three cap sites are used to initiate Y transcription or whether the minor cap sites might be the primary ones used for Y transcripts.
The YA RNAs include the first two introns of 21A. A portion of the second intron of 21B is included in a common mutant 21B allele containing a gene microconversion of nucleotide 656 from C to G as is found in the normal 21A gene (33). Nuclease protection experiments using a 21B intron 2 probe showed that this single base change prevented normal splicing of P450c21B mRNA in COS-7 cells (33). This probe also protected other bands when hybridized to normal adrenal RNA (33); these probably represent YA RNA. Our RNase protection and RT/PCR analysis of normal human adrenal RNA verified retention of intron 2 in YA transcripts (Fig. 4). Sequences corresponding to the first intron of 21B were also left unspliced in YA (Fig. 4, B and C), even though this sequence is identical in the 21A and 21B genes. This intron might be spliced in a very small population of mRNAs undetectable by our RT/PCR assay. The function of Y RNAs is not known. About 14% of wildtype human chromosomes 6 carry a deletion within the C4/ 2l/X locus extending from the 5"flanking DNA of C4A through the 21A and XA genes; similarly, -15% of alleles causing severe 21-hydroxylase deficiency carry a deletion extending from the middle of 21A to the exactly corresponding point in the 21B gene, again deleting the XA gene (2,3,19). Persons homozygous for the former deletion are normal, suggesting that no functional genes reside in this DNA. However, an intact Y gene is preserved in both deletions: the YB gene in the former case and a hybrid YA/YB gene in the latter case. Among >800 alleles analyzed (2,3), no deletion has been described that disables a Y gene without also disabling its corresponding X gene (for example, a short deletion at the 5'-end of 21A or 21B). These data suggest that preservation of overlapping pairs of X and Y genes may be important.
It is not presently known if the Y RNAs encode proteins. The longest YA open reading frame is initiated by a good ATG consensus sequence and encodes a 290-amino acid protein similar to residues 113-306 of P450c21. However, to use this ATG codon, the translational machinery would have to skip several other good ATG codons, including the "authentic" P450c21 translational start site. Even if the Y RNAs do not encode protein, they may still serve a function. The long forms of YA and YB are potentially able to form doublestranded hybrids with XA or XB RNA, which can also hybridize to P450c21 mRNA. The short form of YA overlaps the first 84 bases and the last 5 bases of XA. Because the first 500 bases of XA are not included in XB transcripts (22,23), the short form of YA is complementary to XB only over its last few bases so that it is unlikely that XB/short YA hybrids are formed in uiuo. Irrespective of function, the transcription of the 21A/YA locus may have important consequences. Transcription transiently unravels DNA, increasing its susceptibility to single-stranded breaks. Such breaks may be involved in the gene conversion events causing congenital virilizing adrenal hyperplasia.
There are several examples of overlapping genes in higher eukaryotes. In most cases, the functional consequences of such overlaps are unclear; however, three recent examples suggest possible functions. First, the overlapping bovine fibroblast growth factor transcript in Xenopus oocytes forms double-stranded hybrids with bovine fibroblast growth factor transcripts, allowing extensive post-transcriptional editing of the bovine fibroblast growth factor mRNA by an RNA unwinding activity (34). Second, an antisense transcript from the p53 gene is induced during maturation of cultured murine erythroleukemia cells and interferes with normal processing and transport of p53 mRNA from the nucleus (35). Third, stability of the prespore gene EB4-PSV transcript of Dictyostelium is regulated by an antisense transcript; EB4 is constitutively expressed, but its mRNA only accumulates when the antisense transcript is absent (36). The sequences of our several YA, YB, XA, and XB cDNAs match those of the corresponding genomic DNA exactly, providing no evidence of RNA editing. It is intriguing to speculate that Y transcripts might hybridize to X mRNAs and prevent the formation of X/P45Oc21 hybrids that could impair P450c21 mRNA processing, translocation, or translation; thus, Y transcripts might "protect" P450c21 mRNA from untoward interactions with X transcripts. Resolution of the potential function of the Y transcripts must await identification of a human adrenal cell line that produces these mRNAs.