The 5”Flanking Region of the Human Calreticulin Gene Shares Homology with the Human GRP78, GRP94, and Protein Disulfide Isomerase Promoters*

Calreticulin (CR) is a calcium binding protein that resides in the endoplasmic and sarcoplasmic reticulum and is reactive with human Ro/SS-A autoimmune sera. We have used human CR cDNA to isolate a human 6-kilobase genomic clone that contains 529 base pairs upstream of the presumed transcription start site, 9 exons, 8 introns, and several hundred base pairs 3’ of a polyadenylation sequence. Analysis of the human CR promoter region reveals a number of potential regula- tory sites also found in the human GRP78, GRP94, and protein disulfide isomerase promoters, including multiple Spl and CCAAT consensus sequences, an AP-2 recognition sequence (absent in protein disulfide iso- merase), and multiple GC-rich areas. DNA footprint and gel shift analysis on the CR 5’-flanking region demonstrates an area that is bound by protein found in human but not murine nuclear extracts. This sequence is homologous with previously determined regulatory sequences of the human GRP78 and GRP94 promoters. These data indicate that CR, GRP78, GRP94, and pro- tein disulfide isomerase may in part have similar transcriptional regulation and suggest that their gene prod- ucts while structurally distinct may have similar functions or co-functions. These observations are of additional interest as all four of these genes encode acidic proteins that localize to the endoplasmic reticulum.

cellular function has not yet been determined. The human CR gene is not highly polymorphic, exists in a single copy on the short arm of chromosome 19, is highly conserved among animal species, and is expressed in a variety of different cell lines and tissues (1, 2). To better define the regulation of this gene's expression, a human genomic CR clone was isolated and sequenced, and DNA footprint studies were performed on the 5"flanking region.

MATERIALS AND METHODS
Genomic Clone Isolation-A human genomic library derived fr6m the RPMI 8402 T-cell line constructed in X bacteriophage 2001 was obtained from Dr. Richard Baer, University of Texas, Southwestern Medical Center. This library was screened by the duplicate filter lift method with the 1.9-kb CR cDNA previously isolated in our laboratory (1). The cDNA was radiolabeled by the hexamer extension method with hexamer primers (Pharmacia LKB Biotechnology Inc.), [a-"'PIdCTP (Du Pont-New England Nuclear), and Escherichia coli DNA polymerase I (Klenow fragment, Promega Corp., Madison, WI) (5). Filters were prehybridized for 4 h and then hybridized overnight (along with 1 X IO6 cpm of radiolabeled cDNA/ml of solution) in Hybrisol (Oncor, Gaithersburg, MD). The filters were washed three times in 2 X SSC (1 X SSC is 0.15 M NaCl and 0.015 M sodium citrate, pH 7.0) and 0.1% sodium dodecyl sulfate a t 37 "C for 15 min each, then once in 0.1 X SSC and 0.1% sodium dodecyl sulfate a t 55 "C for 40 min. Filters were exposed to Kodak X-OMAT-AR film between intensifying screens for 20 h and then developed on a Konica $X-60A film processor. Double positive areas were picked and subjected to plaque purification (5). Southern Filter Hybridization-Six individual clones were digested with various restriction enzymes (Promega Corp.), and the restriction fragments were analyzed by Southern filter hybridization with radiolabeled portions of the 1.9-kb cDNA (5). A genomic clone that contained the 5'-and 3'-most portions of the 1.9-kb cDNA was identified.
Subcloning-This genomic insert was cut out of the X phage arms with SacI (Promega Corp.), and the complete restriction digestion mixture was ligated into the SacI site of the pTZ 18U plasmid (U. S. Biochemical Corp.) with T4 DNA ligase (Promega Corp.) (5). BSJ E. coli cells were transformed with the resulting pTZ 18U/genomic clone ligation mixture and plated on ampicillin-containing agar. Double filter lifts were made of the resulting colonies, and these filters were hybridized with radiolabeled 1.9-kb CR cDNA (5). Several positive colonies were picked, and their insert sizes were characterized by SacI restriction enzyme digestion and agarose gel electrophoresis (6). A colony containing an apparent full-length 6-kb DNA insert was utilized for plasmid amplification and purification (5).
Polymerase Chain Reaction Amplification and Nucleic Acid Sequencing-Synthetic oligonucleotides (oligos) 18-24 bases in length were made corresponding to both strands of the CR cDNA a t approximately 300-base pair overlapping intervals. Each upstream 5'-oligo was paired with its downstream 3"oligo. These oligo pairs were phosphorylated (5) and used in the polymerase chain reaction (PCR) with the DNA Amplification System (Perkin-Elmer) using 5 ng of the 6-kb genomic insert as a template. The resulting PCR amplification mixture was directly ligated into SmaI-cut M I 3 mp19 plasmid after two ammonium acetate and one sodium acetate/ethanol precip-2557 Calreticulin, GRP78,and GRP94 Have Similar Promoter Elements itations (6). TGI E. coli cells were transformed with the ligation mixture and plated in top agar. Double filter lifts were made from the resultant plaques, and positive plaques were identified with radiolabeled CR cDNA or with [y-'*P]ATP end-labeled oligos. Singlestranded DNA was generated corresponding to both strands of DNA (7) and sequenced by the Sanger dideoxy method with [n-'"PIdATP and modified T 7 DNA polymerase (Sequenase) according to the manufacturer's recommendations (U. S. Biochemical Corp.). Any discrepancies between the sequences of each strand were resolved by repeating the sequencing methods one or more times to determine the correct sequence. One or more PCR products were sequenced to determine intron-exon boundaries. Four different PCR products were sequenced to obtain the 5"flanking (promoter) sequence. Gel Mobility Shift Assay-Nuclear extracts were prepared as described by Dignam et al. (8). One microgram of each proteinase inhibitor (leupeptin, peptastatin, and antipain) were used in the preparation of extracts. Gel retardation assays were performed as described previously (9). Approximately 0.02-0.05 mg of protein from nuclear extracts was incubated with 3 nM DNA probe in a total volume of 20 p1 a t room temperature. 10 pg of single-stranded calf thymus DNA and 1 pg of double-stranded calf thymus DNA were used as nonspecific competitors in each reaction. The CR promoter probe was the PstI-AccI fragment that was end-labeled by Klenow with [a-:"P]dCTP (6). The immunoglobulin heavy chain promoter probe was prepared as described previously (9).
DNA Footprint Analysis-For DNA footprinting, the polyacrylamide gel was soaked in 200 mi of 50 mM Tris buffer (pH 8.0) after electrophoresis, and DNA footprinting was performed with copperphenanthronin as described by Kuwabara and Sigman (10). Material from five lanes of retarded DNA probe was eluted, pooled, and separated in a 10% denaturing polyacrylamide gel (6). The G reaction of the CR promoter probe was performed by the Maxam and Gilbert method as described (11).

RESULTS
A human T-cell genomic library was screened with radiolabeled human CR cDNA. The initial library screen produced six genomic clones, only one of which hybridized to the 5'and 3'-most portions of the 1.9-kb CR cDNA. This genomic insert was excised from the X phage arms in one piece with Sac1 and measured approximately 6 kb by agarose gel electrophoresis. Synthetic oligonucleotides corresponding to CR cDNA sequences of both strands approximately 300 base pairs apart were used to amplify the genomic insert. In this manner the various portions of the insert were amplified and subsequently sequenced. This allowed the construction of the genomic map shown in Fig. 1A.
The transcribed portion of the gene corresponding to its 1.9-kb cDNA sequence is contained within 4.5 kb of chromosomal DNA. This is consistent with our earlier Southern filter hybridization data that showed the gene to be contained within 6 kb of chromosomal DNA (1). There are 9 exons and 8 introns, the lengths of which are shown in Table I. Introns contribute about 2.4 kb to the gene. Four of the eight introns measure less than 100 bp. Like the vast majority of mammalian genes, most of the introns are type 0 or type I (Table I) (12). The exon-intron junctional sequences are highly homologous to the vertebrate exon-intron consensus sequence as shown in Table I (13).
The introns do not clearly fall between the previously predicted structural domains of the CR molecule (1, 4) as described in some proteins (12). However, intron 6 falls between two sequence replications, and intron 8 is located immediately proximal to the strongly charged carboxyl-terminal domain (Fig. 1, B and C) (1).
The insert includes 529 bp 5' to the approximate transcription start site that we arbitrarily reference as position 1, which is the 5'-most base in the CR cDNA clone. Within this 5"flanking sequence are several putative regulatory sequences (see Table 11). These include a TATA box (-28 to -22), four CCAAT sequences (-93 to -89, -124 to -120, -194 to -190, and -207 to -211), and several GC-rich areas including four putative S p l binding sites (-12 to -7, -74 to -69, -312 to -307, and -362 to -357) (Fig. 2). These sequences are typical for the promoter elements of genes transcribed by RNA polymerase I1 (14, 15). The promoter element also contains the AP-2 sequence CCCAGGC (-521 to -515) found in SV40 and bovine papilloma virus enhancers, the human c-myc and growth hormone regulatory regions, and in the histocompatibility H-2Kh genes (16). The human histone H4TF-1 recognition sequence GATTTC is present a t positions -183 to -188 (17). The sequence GGGNNGGG, where N is any base, occurs seven times in the CR promoter region (Fig. 2). Two of these are inclusive of S p l binding sites. There are also a number of other poly G-rich sequences including nine GGG, three GGGNGGG, one GGGNNNGGG, one GGGNNN-GGGNGGG, and one GGGNNGGGNNGGG sequences. The palindromes TGGTCGACCA and CAGCTG begin at positions -223 and -372, respectively (Fig. 2).

CCTTGcccAGSlcCEnCCACnAG~gggMTgggAgggAGAGMGCTGAg -400 g g C A g g g T c c c C C T c c c C C C~A G A C A G n G C C n~9~~ -350
T I L M C C e c e A G A T g g g C M C C~~C C C C C G G A C C A~g g T T g g g -300

~C~. C G T C T u j T C A C A T G A C~C~A c c X i n~C = =~C A = = =
-250

C A C~G T g g g C C T e c s C s c e A C C C C T~T C G A C C A T C~T C G G~
-200

C C T M C A T A G X i M C C G A C C M O T~~~G A C G G C C A T g 9 9
-100 C A T A C A~~A C~G~C A g~g~~~M g g g~g g g T C A~ -50

T T G G T T T G A G A o G C g q q T g g g~G T G C M~C~C G G C C T
-1 where 26 of 28 nucleotides are either guanine or thymine (data not shown). Such downstream GT-rich areas are thought to be important in the cutting of the transcript distal to the polyadenylation signal prior to the addition of the poly(A) tail (19).

C C C T C C C T A C T G C A C A C C C C T G C C G G A~T C C T~~~C
In order to better understand the transcriptional regulation of the CR gene, gel mobility shift analysis and DNA footprinting were performed. A PstI-AccI DNA fragment, containing the region from -219 to +14, was used as a binding probe. This region contains many of the possible promoter ciselements: CCAAT (-89 to -93, -124 to -120, -194 to -190, -207 to -211), Spl (-12 to -7, -74 to -69), and TATA (-28 t o -22) sites for basal level transcription by RNA polymerase I1 (14, 15). Nuclear extracts from Wil-2, an Epstein-Barr virus-transformed human B-cell line, and BCL,, a murine lymphoma B-cell line, were used in the analysis. The results are shown in Fig. 3.
The mobility shift pattern of the Wil-2 nuclear extract with the CR promoter probe showed a doublet of retarded bands (Fig. 3). A mixture of the BCL, extract and CR probe showed no gel retardation (Fig. 3). These results indicate the presence of trans-factors in the Wil-2 extract which form DNA-protein complexes with the CR promoter that are not in the murine BCL, cell. Identical retarded bands were seen in the same assay when human HeLa cell nuclear extract was used in the analysis (data not shown).
Both Wil-2 and BCL, nuclear extracts had an octamer transcription factor-1 (OTF-1) complex when incubated with a n immunoglobulin heavy chain promoter probe (Fig. 3). This result indicates that the failure of the BCL, extract to form DNA-protein complexes with the CR promoter was not caused by a general degradation of the BCL, extract. Transcription factor binding sites were localized by DNA footprint analysis with the copper-phenanthronin method (10). Both Wil-2 and HeLa extracts show evidence of protection on the CR promoter fragment from positions -111 to -127 and -77 to -84 (Fig. 4). There are also enhanced cleavage sites at -109, -110, -115, -116, and -122 to -124. These enhanced cleavage sites might represent structural changes of the DNA segment resulting from the binding of proteins. The presence of a synthetic oligonucleotide corresponding to positions -135 to -103 abrogated the gel retardation noted in Fig. 3 (data not shown), which further supports that this sequence contains a binding site.

DISCUSSION
Using our previously isolated 1.9-kb human CR cDNA (l), we have isolated the human genomic form of this gene that measures approximately 5 kb from its RNA polymerase I1 promoter region to its downstream GT-rich processing element. The presence of multiple GC-rich areas in the promoter portion of this gene, including four putative Spl binding sites, suggests that its protein product may have a housekeeping function (14).
The multiple CCAAT sequences in the promoter region are somewhat unusual, and although it is uncertain how many of these are functional, their multiplicity does suggest a more complex system of transcriptional control. Interestingly, multiple CCAAT sequences and two putative Spl binding sites are present in the human glucose-regulated protein (GRP78 complexes. DNA fragments were treated with copper-phenanthronin in the gel matrix for 3-4 min after electrophoresis as shown in Fig. 3. The free DNA probe and retarded DNA probe were recovered and subsequently loaded onto a 10% denaturing polyacrylamide gel. Lane 1 is a G reaction of the CR promoter probe; lane 2 is the DNA footprinting pattern of free probe; lanes 3 and 4 are DNA fragments from retarded complexes with HeLa extract and Wil-2 extract, respectively. Arrows mark the enhanced cleavage sites on the DNA fragment within the complexes. The protected areas are half-bracketed. and GRP94) and protein disulfide isomerase promoters (20-22). GRP78 and GRP94 have greater than a 45% sequence homology with heat shock protein (HSP) 70 and HSP90/83, respectively (23, 24). Like HSP70 and HSP90, the GRP proteins and protein disulfide isomerase are thought to play a role in protein transport, folding, and (or) assembly (23,25). Previous studies have demonstrated that GRP78 (also known as BiP) may assist with the linkage of immunoglobulin heavy and light chains in B-lymphocytes (26-28). GRP gene expression is constitutive and inducible by glucose deprivation, blockage of glycosylation, and by disruption of intracellular calcium levels (29-32). Protein disulfide isomerase has many different functions and is believed to be a cellular catalyst in disulfide bond formation (33)(34)(35)(36).
Similarities between CR, GRP78, GRP94, and protein disulfide isomerase promoter sequences are of interest as each gene encodes an acidic calcium binding protein that localizes to the endoplasmic reticulum and has the KDEL carboxylterminal ER retention sequence (21,37,38). Additionally, CR gene expression, like GRP gene expression, is inducible by calcium ionophores (39, 40). Therefore, it would follow that there might be one or more common transcription regulatory elements in the CR, GRP, and protein disulfide isomerase promoter regions. Alignment of the CCAAT or CCAAT-like sequences of CR, protein disulfide isomerase, GRP78, and GRP94 indicates that multiple CCAAT or CCAAT-like sequences occur approximately 30-50 bases apart in each of these promoters (Fig. 5A). Some genes encoding HSPs have been shown to have multiple copies of promoter elements with enhancer properties that appear to have arisen through sequence replications (41-43). This latter fact is intriguing as the GRP78 protein sequence is approximately 60% homologous to HSP70 and its promoter has been shown to have enhancer-like activity (23, 44). The CR promoter region has multiple short A indicates the complementary strand. The topmost sequence of GRP94, GRP78, and CR contains a DNA footprint-protected sequence that is ouerlined. For ease of sequence comparisons, nucleotide doublets are in boldface, and higher base repetitions are in lower case. R, two different sequence alignments of the protected regions with greatest sequence homology. Upper case letters in consensus sequence represent a 3 out of 3 base match, and lower case letters represent a 2 out of 3 base match. The numbers reference nucleotide positions relative to the putative messenger RNA transcription initiation site. CCAAT and CCAAT-like sequences are underlined. Gaps are represented byand nucleotide matches by :.

CRP78
sequence repetitions in addition to the multiple CCAAT sequences and thus could conceivably have enhancer-like activity as well.
Previous DNA footprint analysis and chloramphenicol acetyltransferase transcriptional studies have localized a major regulatory element in the human GRP78 promoter (-160 to -101) that is homologous to a major regulatory element in the human GRP94 promoter (-201 to -164) (21, 45). These elements are important in the constitutive expression of these two genes (21,45). Nucleic acid sequence comparison of these two regions with the CR gene promoter region revealed several homologous sequences centered around a CCAAT motif as shown in Fig. 5A. One of these homologous CR sequences was protected by DNA footprint analysis and has significant sequence homology with the protected regions of GRP94 and GRP78 as shown in Fig. 5B. This would suggest that CR and the GRPs may have a common regulatory sequence which would enable these genes to be coordinately expressed, as previous studies have suggested (39).
The specific retarded bands seen in the CR gel mobility shift assay (Fig. 3) are not likely to be the DNA-protein complexes that occur at common transcriptional motifs such as CCAAT box, TATA box, or Spl sites, because these transfactors are common to all mammalian cells. If those common DNA-protein complexes were detectable, we should have been able to observe the same retarded bands with all extracts. However, the complexes were not detectable with the murine nuclear extract as they were with both human cell nuclear extracts. It is interesting that the gel shift of the CR promoter fragment could be abrogated by a synthetic oligonucleotide corresponding to an area protected by footprint analysis (Fig.  2). This synthetic oligonucleotide is inclusive of a CCAAT sequence though it is less likely that a CCAAT trans-factor is responsible for the gel shift, as the BCLl nuclear extract, which contains CCAAT box binding protein, causes no shift. The characteristics of these putative transcriptional regulatory factors remain to be further elucidated.
The DNA footprint analysis of CR revealed enhanced cleavage areas that might indicate a structural difference in the DNA segment resulting from the binding of trans-factors (46). These kinds of structural changes often aid in the opening of a DNA segment for the initiation of RNA polymerase I1 transcription.
The AP-2 (16) and H4TF-1 (17) recognition sequences found in the CR promoter region are typically present in genes that are active during cellular proliferation. Thus their presence in the CR gene is consistent with the finding that CR messenger RNA is expressed at higher levels in rapidly dividing lymphocytic cell lines and in lipopolysaccharidestimulated peripheral blood leukocytes than in unstimulated peripheral blood leukocytes.* The human GRP78 and GRP94 promoter regions also contain an AP-2 sequence (20, 21).
Interrupted poly(G) sequences occur frequently in the CR promoter region, particularly the GGGNNGGG motif which may be a recognition sequence for transcriptional regulation, though none of these sequences were protected by our DNA footprint analysis. The GRP78 and GRP94 promoter regions are extremely GC-rich like the CR promoter region, but each has only the sequence GGGNNGGGGNNGGG that resembles two overlapping GGGNNGGG motifs. However, it is interesting to note that the rabbit calsequestrin (CS) promoter region has six different GGGNNGGG motifs (47). Like CR, CS is an acidic calcium binding protein of the sarcoplasmic reticulum (47). Unlike CR, however, CS is not found in the ER (48). CR is thought to be a major storage depot of calcium in the ER analogous to the calcium binding function of CS in the sarcoplasmic reticulum (48,49). To our knowledge the GGGNNGGG motif has not been described as a sequencespecific binding site of transcriptional regulation. However, the sequence GGGGCGGG which conforms to this motif has been described in chicken globin genes as a nuclear transcription factor binding site (50,51). Although this sequence is similar to the Spl recognition sequence, there is evidence that a nuclear factor other than Spl binds to it (51).
CR does not have a high degree of amino acid sequence homology with previously published HSP sequences as the GRPs do. However, like some of the HSPs (including members of the HSP70 family), CR is highly conserved among species and it can be targeted in autoimmune disease (2,52,53). The vast majority of patients with the autoimmune diseases subacute cutaneous lupus erythematosus and neonatal lupus erythematosus have Ro/SS-A antibodies, and their skin disease is frequently exacerbated by UV light exposure. UV light exposure may induce a heat shock response in human keratinocytes that up-regulates the expression of CR. One group of investigators has indicated that CR can be transiently secreted from cells after treatment with calcium ionophores, although this has not been substantiated by others (39,54). This suggests a mechanism where UV light might displace CR from its location in the ER to the extracellular space where it could be targeted by circulating Ro/SS-A antibodies resulting in tissue injury. This theory is further supported by immunofluorescence studies that have shown UV light-induced displacement of Ro/SS-A antigen from within keratinocytes to the cell surface (55). 14.