Structure and Function of the Chromogranin A Gene CLUES TO EVOLUTION AND TISSUE-SPECIFIC EXPRESSION*

Chromogranin A is the index member of a family of acidic proteins stored and released throughout the neuroendocrine system with peptide hormones and neu-rotransmitters. To better understand its functional do- mains, its evolutionary lineage, and the basis of its tissue-specific pattern of expression, we obtained a mouse chromogranin A cDNA and used it to isolate the chromogranin A gene from the mouse genome. We then characterized the gene’s exoniintron structure, and the structure and function of its B’-regulatory region (pro- moterienhancer). The chromogranin A gene was complex, its eight exons and seven introns spanning about 11 kilobase pairs. The eight exons displayed some correspondence to putative functional domains suspected within the cDNA-deduced primary structure of the protein. Three exons also displayed both length and sequence homology to exons in another member of the chromogranin/secretogranin family, chromogranin B, suggesting an evolutionary relationship.

Chromogranin A is the index member of a family of acidic proteins stored and released throughout the neuroendocrine system with peptide hormones and neurotransmitters. To better understand its functional domains, its evolutionary lineage, and the basis of its tissue-specific pattern of expression, we obtained a mouse chromogranin A cDNA and used it to isolate the chromogranin A gene from the mouse genome. We then characterized the gene's exoniintron structure, and the structure and function of its B'-regulatory region (promoterienhancer). The chromogranin A gene was complex, its eight exons and seven introns spanning about 11 kilobase pairs. The eight exons displayed some correspondence to putative functional domains suspected within the cDNA-deduced primary structure of the protein. Three exons also displayed both length and sequence homology to exons in another member of the chromogranin/secretogranin family, chromogranin B, suggesting an evolutionary relationship. A 1.2-kilobase pair genomic fragment just 5' of the coding region was able to program cell type-specific gene expression in a transfection/reporter system. Both pituitary corticotropes and adrenal chromaffin cells recognized this promoter. The promoter possessed some known consensus transcriptional control elements (TATA box, cyclic AMP-response element, and Spl site), but otherwise novel transcriptional control elements seem to be operative. The results suggest that this complex gene is encoded by exon modules with evolutionary links to homologous modules in other chromogranin/ secretogranin protein family members, and that the

5'-
flanking region of the gene is sufficient to confer neuroendocrine tissue-specific expression of the chromogranin A gene. rotransmitter storage vesicles throughout the neuroendocrine system (3-5).
Chromogranin A is the topic of many unanswered questions. Although it exists as a single copy gene in both the rodent and human genomes (6-lo), the organization of its gene's structure is unknown. The cDNA-deduced primary structure of chromogranin A suggests a number of functional domains in the protein (6-8), but whether such domains correspond to discrete exons has not yet been investigated. Finally, although the chromogranin A gene is selectively expressed at high levels within the neuroendocrine system (3-5), the basis of this tissue-specific expression is unexplored.
To address these issues, and ultimately to be able to manipulate the chromogranin A gene in an,intact organism, we have isolated the gene from the mouse genome. We established its exon/intron structure, isolated and sequenced its 5"regulatory region, and established its cell type-specific promoter activity by transfection and expression after reporter construction. Our results suggest that this complex gene is constructed of exon modules with evolutionary links to homologous modules in other chromogranin/secretogranin family members, and that the region just 5' of exon 1 is sufficient to confer neuroendocrine tissue-specific expression of the chromogranin A gene.

Mouse Chromogranin A c D N A Isolation-
A mouse brain cDNA library in the vector X-ZAP (Stratagene, La Jolla, CA) was screened by plaque hybridization with a nick translation (11) labeled fulllength rat chromogranin A cDNA (6). 43 positive clones were found in the 200,000 clones screened. Restriction digestion established that 3 of these were greater than 1.8 kb' in length, and therefore likely full-length (6). The longest (clone 5-3; 1892 bp) of these clones was sequenced by the dideoxy chain termination (12) method (Sequenase Kit, United States Biochemicals, Cleveland, OH). Synthetic 17-mer oligonucleotide sequencing primers were synthesized to enable progressive sequencing runs spanning each strand of the entire cDNA (Model 381A automated oligonucleotide synthesizer, Applied Biosystems, Foster City, CA).
Sequence comparisons were made with Pustell sequence analysis programs (International Biotechnologies, Inc., New Haven, CT) using an IBM-PC-AT microcomputer.
Mouse Chromogranin A Genomic Clone Isolation-The ?'P nick translation (11) labeled full-length mouse chromogranin A was used to screen 160,000 clones of a mouse (strain AJ) genomic DNA library in the vector sCos-1 (14,15). Colony hybridization (13) yielded 42 positive clones. Since the poly-Gln trinucleotide repeat ([CAG],) region of rodent chromogranin A recognizes dispersed sequences throughout the genome (6, 7), we rescreened the 42 positive clones with a "'P-end labeled (13) synthetic 20-mer oligonucleotide corresponding to the farthest 5' portion of the 5"untranslated region of the mouse chromogranin A cDNA: 5'-GCTGCTGCCGCCGCCACC GC-3'; 3 of the 42 clones remainedpositive. Regions of overlap among ' T h e abbreviations used are: kb, kilobase pair; bp, base pair; ACTH, adrenocorticotropic hormone. the 3 positive clones were established by restriction digestion, and 2 overlapping clones spanning a total of approximately 52 kb were restriction-mapped at EcoRI, XhoI, and BamHI sites using the phage T 3 and T7 promoters flanking the inserts (14,15; Gene Mapping Kit, The exon/intron structure of the gene was established by resequencing the mouse chromogranin A coding region, but this time in exons within genomic DNA. As exon/intron borders were crossed, new primers were synthesized corresponding to the next exon, and sequencing continued. Positioning of each exon on the established genomic clone maps (14,15) was accomplished by restriction digestion of the genomic clones with 9 different enzymes (EcoRI, XhoI, BamHI, EcoRV, HindIII, XbaI, SmaI, AccIII, and PstI), followed either by Southern blotting (13), using exon-specific 3ZP-end labeled (13) synthetic oligonucleotides (the previously mentioned sequencing primers) as probes, or by subcloning some of the genomic restriction fragments (n = 7) into pBluescript (Stratagene), after which the inserts were sequenced (12).
Promoter Analysis-The genomic DNA 387 bp 5' of the ATG initiation codon (or 230 bp 5' of the longest cDNA 5' end) was sequenced on both strands by the dideoxy chain termination method (12).
To establish the mRNA cap site, we performed primer extension (13) on an mRNA template, using an oligonucleotide primer corresponding to nucleotides -72 through -103 (numbering 5' of the ATG initiation codon) of the longest cDNA: 5"GCTACCAAAGCTCT-GCACCGGGAACACTG-3'. The source of the mRNA (13) was the mouse endocrine cell line AtT-20 (16), while the source of the negative control mRNA was mouse liver.  Table I). This fragment was blunted (13) and ligated into the blunted HindIII promoter insertion site of the promoterless luciferase expression reporter vector pSVOALA5 (17). Proper orientation of the promoter insert in the resulting construct (pHJALA5) was verified by appropriate asymmetric double restriction digestion (13). The plasmid (10-30 pgl6-crn culture plate, at 40-60% cell confluence) was transfected into both neuroendocrine cells and control nonendocrine cells by the efficient lipofection method (18). The neuroendocrine cell types were the rat pheochromocytoma adrenal chromaffin cell line PC-12 (19) and the mouse pituitary corticotrope cell line AtT-20 (16). The nonendocrine (control) cell lines transfected were the mouse fibroblast cell line NIH-3T3, and the Tantigen-transformed monkey kidney cell line COS (20). In parallel transfections into all cell types, a positive control plasmid was used with luciferase transcription driven by the SV-40 early promoter: pSVZALA5 (17). To correct for transfection efficiency differences across cell types, each transfection also involved co-transfection with a control reporter plasmid, pRSV-CAT (21). 48 h after transfection, cell lysates (17) were prepared, and assayed for protein (22), the luciferase reporter (17), and the chloramphenicol acetyltransferase reporter (13). Luciferase results were normalized both to cell protein content and to cell chloramphenicol acetyltransferase content.

Base Pairs
FIG. 1. Sequencing strategy and partial restriction map for the mouse brain chromogranin A cDNA. Each arrow represents the origin, direction, and extent of one sequencing reaction and gel lane (12). The positions of the ATG initiation codon and TGA termination codon are shown. The terminal 5'-EcoRI and 3'-XhoI sites derive from the cDNA cloning vector, X-ZAP. revealed a full-length 463-amino acid open reading frame bounded by initiation (ATG) and termination (TGA) codons. This cDNA contained a remarkably long 157-bp 5"untranslated region. The deduced primary structure of mouse chromogranin A (Fig. 2) suggests an 18-residue hydrophobic Nterminal signal peptide, followed by a 445-amino acid mature protein. Its sequence homology to rat chromogranin A (6) is 16/18 residues identical (89% identity) in the signal peptide, followed by 397/445 residues identical (89% identity) in the mature protein. Analysis of the primary structure of mouse chromogranin A suggests several structural domains in the protein (Fig. 3) also shared by rat chromogranin A (6,7,23), including the signal peptide, 2 cysteines toward the N terminus (at residues 17 and 38) which form an intramolecular disulfide loop (24), an 11-residue polyglutamine region, one consensus site (NQS, residues 112-114) for N-linked glycosylation (25), three sets of oligoglutamic acid clusters, a 52residue pancreastatin (26) homology (residues 259-31 1) with nearby single basic residues, and 8 sets of paired basic residues, likely recognition sites for proteolytic cleavage (27).
Structure of the Mouse Chromogranin A Gene-The chromogranin A gene was restriction mapped onto 2 overlapping mouse cosmid clones spanning about 52 kb (Fig. 4). This complex gene is composed of 8 exons separated by 7 introns, and the coding region spans about 11 kb. Sequence analysis of the exon/intron borders (Fig. 5) revealed good agreement of the borders with customary splice donor/acceptor (GT donor/AG acceptor) consensus sequences (28).
Chromogranin A's exons (Fig. 3) were at least partially congruent with putative functional domains in the primary structure: exon 1 (amino acid residues -18 to -3) encoded 16 of the 18 signal peptide amino acids; exon 2 (residues -3 to +13) encoded the N terminus of the mature protein; exon 3 (residues +14 to +45) spanned the relatively hydrophobic (6) intramolecular disulfide loop; exon 4 (residues +45 to +68) spanned a region without clear distinguishing features; exon 5 (residues +68 to +121) encoded the polyglutamine region as well as the NQS glycosylation site; exon 6 (residues +121 to +259) encoded an oligoglutamic acid cluster and two di-or tribasic sites; exon 7 (the longest coding exon, residues +259 to +418) began at the N terminus of pancreastatin and included not only pancreastatin but also five dibasic sites and two oligoglutamic acid clusters; and exon 8 (residues +419 to +445) encoded the C terminus with one dibasic site.
Analysis of homologies (Fig. 6) between the exons of mouse chromogranin A and mouse chromogranin B (29) reveals intriquing sequence similarities in three exons. Chromogranin A and B exons 2 each encode 16-residue domains at the Ntermini of the mature proteins which share the sequence PV at residues +1 and +2. Exons 3 each encode 32-residue domains sharing 41% (13/32 residues) sequence identity, including the 2 cysteine residues that form the intramolecular disulfide loop (24). Chromogranin A exon 8 and chromogranin B exon 5 encode 25-27-residue C-terminal domains with 44% (11/25 residues) sequence identity.
One site of polymorphism differentiated the mouse chromogranin A coding (exon) sequences in the cDNA versus the genomic DNA: while the (CAG), trinucleotide repeat region in the cDNA encoded an 11-residue polyglutamine region (Figs. 2 and 3), the same (CAG), region in the genomic clone encoded a 18-residue polyglutamine region.
Mouse Chromogranin A Promoter-Sequencing the genomic DNA just upstream of exon l (Fig. 7) revealed consensus sequences for a TATA box (TATAAAA) promoter element (28)

GGCAGCCCCMCCGCAGAGCAGAGGACCAGGAGC~AGAGAGCCTG~CAGCCATCGAGGCAGAGCTGGA~GGTGGCCCACCAGCTGCAGGCIIIGCGGCGGGG*TG*GGCCCTGGCIGG G S A N R R A E O O E L E S L S A I E A E L E K V A H O L O A l R R G -430 445
1561 TCGCGCCAGCCAlGGCTlCM~GCCAGACTGClGCTCGAGTAGGGClGCTlCCAGCCACAGAGCCCAGGllACCCCCTTlGCCCC~UCClTClCTlGTTCTUGCCCCTGCCCTGGACA 1681 CTlCTGCAGGGCAGCCClGAATGTCAGCACAGAl~CCllClCTGAACACAGGCAGClllClAGAAGlllCTCTTCCACCTlCTTATCCATGGGGCACAAClGCMTMCllClGACClTl S p l sites (GGGCGG; 31) at positions -77 to -82 and -112 to -117. A CAAT box homology was not found (28). No other perfect (100%) consensus matches for other known transcriptional control regions were apparent up to nucleotide position -230. The mouse promoters for chromogranins A (Fig. 7 ) and B (29) shared a TATA box, a cyclic AMP-response element, and an Spl site, but were otherwise distinct.

C G G C G A M G C C C A G M C l C C T G A C T A l l A A G A T A T l C C A G A l A A A A l A l A l l G A A G G A A G A A T M A G l G C T l l G G G C T C T T T~
Primer extension of a neuroendocrine (AtT-20 cell) mRNA revealed a unique transcriptional start (cap) site for the chromogranin A message (nucleotide +1, Fig. 7), also establishing the numerical and spatial positions of putative promoter elements. When the cap site is compared to the 5' portion of the cDNA sequence (Fig. 2), it becomes apparent that the cDNA was only one nucleotide short of being fulllength to the transcription initiation site.
When the 1.2-kb putative 5'-regulatory region of the gene (an EcoRI-PstI fragment extending from about 1.2 kb upstream of the cap site, to 54-bp downstream of the cap site) was inserted in the proper orientation into a luciferase reporter vector (Table I) Intron/exon borders in the mouse chromogranin A gene. The borders were established by sequencing the chromogranin A coding region using the genomic DNA cosmid clones as templates (12). Nucleotides in coding exons are capitalized, while nucleotides in noncoding introns are in lower case. In the exons, nucleotides are grouped by triplet codon. The exon/intron borders are in reasonable agreement with consensus sequences for splice donor/acceptor sites (28). The nucleotide numbers are based on the longest (most 5') cDNA clone obtained (clone 5-3; Figs. 1 and 2). moterless luciferase negative control plasmid (pSVOALA5) was virtually without expression in these studies.

DISCUSSION
Mouse chromogranin A is similar to rat chromogranin A (6) in primary structure (Fig. 6) as well as in the conservation of several structural domains (Fig. 3). However, some differences between mouse and rat chromogranin A were noted ( a ) the mature mouse chromogranin A was 1 amino acid residue longer (445 uersus 444 residues), ( b ) mouse chromogranin A, unlike rat chromogranin A (6), contained no RGD integrin sequence (32), (c) although both rat (6, 7, 23) and mouse (Figs. 2 and 3) chromogranin A contain polymorphic polyglutamine domains encoded by (CAG), trinucleotide repeats, the polyglutamine region in the rat (16-20 residues) differs in length from that in the mouse (11-18 residues), and (d) mouse chromogranin A contains 8 dibasic cleavage sites (Figs. 2 and  3), uersus 10 dibasic sites in rat chromogranin A (6, 7, 23). Chromogranins A generally display considerable interspecies sequence homology, especially at the termini (6, 23). The pancreastatin domain in mouse chromogranin A has nearby single basic residues, rather than dibasic or tribasic clusters (Fig. 3). However, prohormone processing proteases may selectively recognize certain single basic residues in chromogranin A (33).
Chromogranin A' s eight exons showed substantial correspondence to putative functional domains deduced from the cDNA sequence (Fig. 3). In addition, three of chromogranin A's eight exons (exons 2, 3, and 8) displayed both sequence and length homology to three of chromogranin B's five exons (29; Fig. 6). Perhaps these exons were common to an evolutionary ancestor of chromogranins A and B, ultimately giving rise to separate chromogranins by gene duplication and subsequent mutation (34). Also of note, the C-terminal homologous chromogranin A exon 8 and chromogranin B exon 5 were each bounded or closely apposed by dibasic sites, suggesting that the ability of this domain to be liberated by proteolytic cleavage may be conserved across the chromogranins. This Cterminal exon displays additional partial sequence homology to other members of the chromogranin/secretogranin family, including secretogranin I1 (chromogranin C; 35, 36) and secretogranin I11 (brain clone 1B1075; 37).
The transient expression results (Table I) suggest that information sufficient to confer both strong and tissue-specific expression of the chromogranin A gene is contained " E x o n l->i<-Exon

mCgA Q N L L K E L Q D~L Q G I I K E R Q Q P L K Q Q O P P K Q Q Q Q O~Q Q Q E~H S S F E D E L S E V F E N O S P D A~D~E~
=CgA " " " " " " " " " " _ . . . . . . . . .

-Exon 6 mCgA P S R D T M E K R K D S D K G Q Q D G F E A T T E C P R P Q A F P E P N Q E S P~G D S E S~E D T A~T Q S~S~S Q E~D P Q A
... A --K--V---E--------A--G-------------K---s---N-Q-------N------------

mCgA S R E W E D K R W S R n O Q~K E L T A E K R L E G E 0 D P D R S ) I K L S F R T~Y G F R D P G P Q~G~S S R E D S V~S D F
=CgA """"""""""""""""""""A""""""""""""""G" --Exon 7->14<-EYOn 8 .
* The vertical arrows indicate amino acid residues corresponding to exon borders; a single vertical arrow indicates that the exon borderline falls within a single amino acid's triplet codon, while a double arrow indicates that the exon borderline falls between triplet codons. Dashes indicate amino acid residues conserved between mouse and rat chromogranin A. Dots indicate gaps. Bold letters indicate amino acid residues conserved between mouse chromogranins A and B. Asterisks indicate di-or tribasic sites. Diamonds indicate a potential N-glycosylation site (NQS). The symbol @ indicates cysteine residues forming the intramolecular disulfide loop. Amino acid residue numbers in the mature proteins are given in the right column; signal peptide residues are assigned negative numbers. mCgA, mouse chromogranin A; rCgA, rat chromogranin A; mCgB, mouse chromogranin B. within the genomic DNA immediately (1.2 kb) 5' of the chromogranin A coding region (Fig. 4). Indeed, this promoter/ enhancer region was as potent as the SV-40 early promoter in chromaffin cells, and 18-fold more potent in anterior pituitary corticotropes, but was virtually inactive in nonendocrine cells. Initial sequence analysis of this promoter revealed only a few exact sequence homologies to known consensus promoter/enhancer elements, such as the TATA box (28), the cyclic AMP-response element (30), and Spl-binding site (31). Thus, there are no doubt novel transcriptional control elements in cis within this promoter, which may be of funda-Function of the Chromogranin A Gene TABLE I Activity of the mouse chromogranin A (CgA) promoterlenhancer Activity of the mouse chromogranin A promoter/enhancer (1.2 kb 5"fragment in the construct pHJALA5) assessed by luciferase reporter assay after lipofection-mediated transfection into neuroendocrine cells (PC-12, AtT-20) or control nonendocrine cell lines (NIH-3T3, COS). Each transfection included a control plasmid encoding chloramphenicol acetyltransferase (pRSV-CAT), to correct for differences in transfection efficiency across cell lines. The positive control plasmid, pSV2ALA5, expressed luciferase under the control of the SV-40 early promoter. The negative control plasmid, pSVOALA5, contained the luciferase coding region but no mammalian promoter. After 48 h, transfected cells were extracted and assayed for protein as well as the luciferase and chloramphenicol acetyltransferase reporters. The luciferase reporter results are normalized both to cell protein content and to cell chloramphenicol acetyltransferase content. For each cell type, results are normalized to the activity of the SV-40 early promoter (100%). mental importance in determining the acquisition of the neuroendocrine phenotype, or patterns of gene expression which unite peptide and neurotransmitter storing/releasing cells.