Identification of a Carcinoembryonic Antigen Gene Family in the Rat ANALYSIS OF THE N-TERMINAL DOMAINS REVEALS IMMUNOGLOBULIN-LIKE, HYPERVARIABLE REGIONS*

The existence of a carcinoembryonic antigen (CEA)- like gene family in rat has been demonstrated through isolation and sequencing of the N-terminal domain ex- ons of presumably five discrete genes (rnCGM1-5). This finding will allow for the first time the study of functional and clinical aspects of the tumor marker CEA and related antigens in an animal model. Se- quence comparison with the corresponding regions of members of the human CEA gene family revealed a relatively low similarity at the amino acid level, which indicates rapid divergence of the CEA gene family during evolution and explains the lack of cross-reac-tivity of rat CEA-like antigens with antibodies directed against human CEA. The N-terminal domains of the rat CEA-like proteins show structural similarity to immunoglobulin variable domains, including the presence of hypervariable regions, which points to a pos- sible receptor function of the CEA family members. Although so far only one of the five rat CEA-like genes could be shown to be transcriptionally active, multiple mRNA species derived from other members of the rat CEA-like gene family have been found to be differen- tially expressed in rat placenta and liver. 1977), using universal or internal oligonucleotide primers. The oligonucleotides were synthesized by the phosphoramidite method on an Applied Biosystems 308A DNA synthesizer (Applied Biosystems, Weiterstadt, Federal Republic of Germany). The oligonucleotides still carrying the trityl group at the 5' end were purified by high pressure liquid chromatography on a C1 Ultropac TSK TMS-250 column (LKB, Freiburg, Federal Republic of Germany) using a gradient of acetonitrile (10-25%) in 0.1 M triethylammonium acetate, pH 7, for elution. After deprotection and removal of the trityl group according to the manufacturer's protocol, the oligonucleotides could be used directly for sequencing. For comparison of the nucleotide and amino acid sequences, the computer program "Align2" was used. For the determination of the similarity between amino acid sequences, pairs of amino acids with logarithms of odd matrix scores 2 9 (Feng et al., 1985) were taken as being conservatively exchanged. Secondary structure calculations were performed with the computer program "Novotny" (PCgene, Genofit, Switzerland). enrichment for poly(A) RNA was achieved by only one round of chromatography on oligo(dT)-cellulose.

CEA,' one of the most widely used human tumor markers, belongs to a highly conserved protein family (Shively and Beatty, 1985), the members of which are encoded by approximately ten genes (Thompson et al., 1987). The complete primary structures of three members of the human CEA protein family as deduced from cDNA sequences have recently been reported CEA (Oikawa et al., 1987a;Beauchemin et al., 1987;Zimmermann et al., 1987), a nonspecific cross-reacting antigen (NCA) (Tawaragi et al., 1988;Neumaier et al., 1988), and a pregnancy-specific &-glycoprotein (PSDG) Chou, 1988a, 1988b). Amino acid sequence analyses have revealed that the CEA gene family shows homology to and can be placed within the immunoglobulin supergene family * This work was supported by a grant from the Dr. Mildred Scheel-Stiftung fur Krebsforschung. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "aduertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequence(s) reported in this paper h m been submitted to the GenBank"1EMBL Data Bank with accession numberfs) $To whom correspondence and reprint requests should be addressed.
The abbreviations used are: CEA, carcinoembryonic antigen; NCA, nonspecific cross-reacting antigen; PSBG, pregnancy-specific @,-glycoprotein; kb, kilobase pair(s). (Paxton et al., 1987, Oikawa et al., 198713;Williams, 1987). CEA, NCA, and PSPG can be subdivided into a number of domains: a 34-amino acid leader sequence, a 108-110-amino acid N-terminal domain, a highly conserved 178-180-amino acid repeating unit, of which three copies can be found in CEA, one and a half in PSPG and only one in NCA and a 26amino acid hydrophobic carboxyl region in CEA, which is two amino acids shorter in NCA and is degenerate in PSBG. The torresponding domains show a high degree of sequence conservation between the three proteins, which obviously suggests a common ancestry (Thompson and Zimmermann, 1988).

M22226-M22230.
The evolution of the CEA family presents a mystery, because analyses with polyclonal antisera, which recognize human CEA, NCA, and a number of other members of this family, have been unable to unequivocally identify their counterparts in nonprimate mammals (Wahren et al., 1983). so far, CEA-related molecules could only be detected in higher primates (Haagensen et al., 1982;Jantscheff et al., 1986). The inability to recognize CEA-like antigens below the higher primates indicates either that such molecules do not exist in these species or that they have diverged rapidly during evolution. The extremely high degree of sequence conservation between CEA and an NCA (Thompson and Zimmermann, 1988), as well as the strong homology to another human gene family member (Watanabe and Chou, 1988b), would favor the former speculation. However, a number of reports indicate the existence of oncodevelopmentally regulated glycoproteins in rodents, with very similar biochemical properties to CEA, (Abeyounis and Milgrom, 1976;Howell et al., 1979;Stevens et al., 1975Stevens et al., , 1976Martin et al., 1975aMartin et al., , 1975bvan Hove et al., 1978). In order to clarify this problem, we have made an attempt to isolate CEA-like gene fragments from a rat genomic library. The existence of CEA-like molecules in the rat would allow many studies to be made in an animal system, e.g. studies on the oncodevelopmental regulation of CEA gene expression, which cannot be carried out in humans.

MATERIALS AND METHODS
Tissues-Rat tissues were obtained from the strain BD I1 (Druckrey, 1971). The anesthetized rats were killed by cervical dislocation, and various tissues were immediately removed, washed in cold phosphate-buffered saline, frozen in liquid nitrogen, and stored at -140 "C.
Isolation of pNCAl and pCEA5"The NCA cDNA clone pNCA1 and the CEA cDNA clone pCEA5 were isolated from a cDNA library, which had been prepared from human colon tumor mRNA (Zimmermann et al., 1987). The library was screened with the 2.7-kb EcoRI DNA fragment from clone X39.2, which contains the exon coding for the N-terminal domain of an NCA (Thompson et al., 1987).
Screening of the Rat Gene Library-For the isolation of members of the CEA gene family, a genomic library derived from rat liver DNA was used. The library had been constructed by partial digestion of 6906 the genomic DNA with Sau3A and cloning of the DNA fragments into the BamHI of the X-phage vector EMBL3 (Shinomiya et al., 1984, Frischauf et al., 1983. The recombinant phages were plated onto Q359 bacteria (Karn et al., 1983), transferred to nitrocellulose filters (Schleicher and Schull, Federal Republic of Germany) in replicas and hybridized with CEA and NCA cDNA fragments labeled by random hexanucleotide priming (Feinberg and Vogelstein, 1983) a t 37 "C h the presence of 40% formamide, 5 X Denhardt's solution (1 X = 0.02% each of Ficoll (Pharmacia, Federal Republic of Germany), polyvinylpyrrolidone, bovine serum albumin), 5 X SSPE (1 X SSPE = 0.18 M NaC1, 10 mM sodium phosphate, p H 7.4, 1 mM EDTA), 0.1% sodium dodecyl sulfate, 100 pg of heat-denatured calf thymus DNA/ml. After hybridization overnight, the filters were washed twice for 30 min each in 2 X SSPE, 0.1% sodium dodecyl sulfate a t room temperature and a t 60 "C, respectively. Positive plaques were isolated and plaque-purified twice (Maniatis et al., 1982). DNA Sequencing and Sequence Analysis-For sequencing, exoncontaining subfragments of the recombinant phage DNAs were identified. After digestion with various restriction endonucleases, the resulting DNA fragments were electrophoretically separated, blotted onto a nylon membrane (Genescreen Plus, New England Nuclear, Federal Republic of Germany), and hybridized with the same CEA and NCA cDNA probes used for the isolation of the genomic clones under the conditions described above. Suitable hybridizing genomic DNA and cDNA fragments were subcloned into Bluescript (Stratagene, La Jolla, CA) or M13 vectors. Sequencing was performed on single-or double-stranded templates according to Sanger (Sanger et al., 1977), using universal or internal oligonucleotide primers. The oligonucleotides were synthesized by the phosphoramidite method on an Applied Biosystems 308A DNA synthesizer (Applied Biosystems, Weiterstadt, Federal Republic of Germany). The oligonucleotides still carrying the trityl group at the 5' end were purified by high pressure liquid chromatography on a C1 Ultropac TSK TMS-250 column (LKB, Freiburg, Federal Republic of Germany) using a gradient of acetonitrile (10-25%) in 0.1 M triethylammonium acetate, pH 7, for elution. After deprotection and removal of the trityl group according to the manufacturer's protocol, the oligonucleotides could be used directly for sequencing.
For comparison of the nucleotide and amino acid sequences, the computer program "Align2" was used. For the determination of the similarity between amino acid sequences, pairs of amino acids with logarithms of odd matrix scores 2 9 (Feng et al., 1985) were taken as being conservatively exchanged.
RNA Blot Hybridization-Isolation and analysis of RNA by Northern blot hybridization was performed essentially as described before (Zimmermann et al., 1988). However, enrichment for poly(A) RNA was achieved by only one round of chromatography on oligo(dT)cellulose.

Isolation of Genomic Clones Encoding Members of the Rat CEA Gene
Family-To demonstrate the existence of a CEAlike gene family in the rat, we used a mixture of cDNA fragments, coding for regions of the human CEA and an NCA, as probes for Southern blot analyses. The location of the CEA and NCA cDNAs with respect to their mRNAs are shown in liver genomic library (Shinomiya et al., 1984). Thirteen positive clones were obtained. The DNA of six phage recombinants was further characterized by restriction endonuclease analyses. Comparison of the size patterns of the DNA fragments revealed that two of these clones were identical and two others appeared to overlap. Thus, these genomic clones are apparently derived from four different chromosomal loci of the rat. CEA Gene Family in Rat 6909 in clone XrnCGM4/5-1 (Fig. 1B). These fragments were subcloned and sequenced according to the strategy shown in Fig.  1B. The nucleotide and the deduced amino acid sequences of the N-terminal domain exons are presented in Fig. 2. In each genomic fragment, one open reading frame flanked by canonical splice acceptor and donor sequences (Mount, 1982) could be identified (Fig. 2). The overall similarity among these homologous rat exons lies between 56 and 84% at the nucleotide level and between 39 and 76% at the amino acid level (Fig. 3). This rather low degree of conservation of the amino acid sequence among the different members of the rat CEA gene family increases strongly, if conservative exchanges are allowed (Fig. 4).

Sequence Analysis of the N-terminal Domain Exons and Flanking Introns
The deduced amino acid sequences of the N-terminal domains contain two to five putative N-glycosylation sites (consensus sequence: Asn-X-Ser/Thr, X # Pro), only one of which is conserved in all five sequences (Fig. 4). The presence of C I A L --r N multiple glycosylation sites implies that at least some of the putative CEA-like antigens in the rat are as heavily glycosylated as the human CEA, where 50-60% of its total mass is carbohydrate (Terry et al., 1974).
The similarity between the 5' flanking intron sequences of the different N-terminal domain exons is, with the exception of rnCGMl and rnCGM3, in general very low. In the 5'flanking regions of the N-terminal domain exon of clones XrnCGM1-1 and XrnCGM3-1, simple repeated sequences ([d(GGA)], or [d(AGA)], and [d(CCTT)],) of varying length are found (Figs. 1B and 2). These purine-and pyrimidinerich sequences are located about 300 and 550 nucleotides, respectively, upstream from the start of the N-terminal exons.
Expression of Members of the Rat CEA Gene Family--In order to identify tissues where genes of the rat CEA gene family are transcribed, we screened, in a preliminary experiment, a number of fetal and adult rat tissues by slot blot hybridization. The 1.1-kb EcoRI fragment of XrnCGM1-1 containing the N-terminal domain exon (Fig. 1B) was used as a probe under nonstringent hybridization conditions (2 x SSPE, 60 "C). With RNA from placenta, one of the most prominent hybridization signals was obtained (data not shown). We, therefore, hybridized size-fractionated poly(A) RNA from rat placenta and, for comparison, poly(A) RNA from adult liver, with DNA fragments covering all five Nterminal domain exons. With the probe from XrnCGM1-1, three major mRNA species with lengths of 3.9, 3.2, and 2.5 kb could be detected, whereas the probes from the N-terminal exons of XrnCGM3-1 and XrnCGM4/5-1, exon A, hybridized strongly with the smallest of the three mRNA species only and to a lesser extent with a 3.0-kb species (Fig. 5, lanes 2, 6,  and 8). After prolonged exposure of the RNA blot hybridized with the probe from XrnCGM1-1, CEA-related mRNA species with lengths of 4.6, 3.9, 3.0, and 1.9 kb could be shown to exist in liver (Fig. 5). All observed RNA/DNA hybrids were unstable under more stringent (0.5 X SSPE, 65 "C) washing conditions (data not shown). Essentially, the same results  (Kabat et al., 1987)  After washing in 2 X SSPE a t 60 "C, the membranes were exposed to x-ray films for 6 h (lanes I , 2, 5, and 6). In order to visualize minor mRNA species or weak cross-hybridization, the membranes were reexposed for 2 days (lanes 3, 4, 7, and 8). The smear in the high molecular weight range in lanes 3 and 4 is probably due to binding of the probes to DNA present in these RNA preparations. The numbers on the left indicate the size (in kilobases) of the hybridizing mRNAs.
were obtained when probes lacking the simple repeated sequences described above were used (data not shown). The exon-containing fragments from XrnCGM2-1 (the 0.4-kb SstI fragment) and XrnCGM4/5-1 (the 1.5-kb SstI/BamHI fragment with part of exon B) (Fig. 1B) did not hybridize a t all under the same hybridization conditions (data not shown).

DISCUSSION
In this paper, we present data that demonstrate the existence of a CEA-like gene family in the rat of similar structure as its counterpart in humans (Fig. 4). We have sequenced five different N-terminal domain exons, which show very similar length and exact conservation of the exon/intron borders when compared with the corresponding exons of the various members of the human CEA gene family (Thompson et al., 1987;Oikawa et al., 1987c;Footnote 3). At present, we assume, in analogy to the human CEA gene family, that only one Nterminal domain is contained within each CEA-like antigen of the rat (Beauchemin et Oikawa et al., 1987a;Neumaier et al., 1988;Tawaragi et al., 1988). The five exons, therefore, presumably represent five separate genes. Until individual CEA-like antigens have been characterized in the rat, we have proposed a temporary nomenclature system for these as well as other genes (Thompson and Zimmermann, 1988). Each gene has been designated numerically according to its species (Rattus noruegicus = rn) as a CEA gene-family member (CGM; e.g. rnCGM1). Until now, the presence of CEA-like genes in rodents was not clear, because CEA-like antigens could not be detected with antibodies directed against human CEA in species below the higher primates (Wahren et al., 1983). Taking into account that most epitopes are on the protein moiety of CEA (Hammarstrom et al., 1975) and the low degree of amino acid sequence conservation between the rat and human members of the CEA gene family (Fig. 3), the lack of immunological cross-reactivity between rat CEA-like antigens and antisera against human CEA is not surprising (Johnson et al., 1985). However, further experiments have to be carried out to prove that the CEA-like antigens described before (Martin et al., 1975a;Stevens et al., 1975Stevens et al., , 1976Abeyounis and Milgrom, 1976;van Hove et al., 1978) indeed belong to the rat CEA family.
RNA/DNA hybridization studies showed that at least some genes of this family are transcriptionally active in rat tissues. Six distinct mRNA species have been found in rat placenta and liver. The expression seems to be controlled in a tissuespecific manner and shows quantitative and qualitative differences. In placenta, the mRNA species with lengths of 3.9, 3.2, and 2.5 kb are very prominent. The latter two mRNAs are not present in liver, where, however, two additional mRNAs of 4.6 and 1.9 kb are found. All CEA-like mRNAs seen in liver are expressed a t a comparatively low level. At present, these data cannot be interpreted extensively, because under stringent hybridization conditions, no transcripts could be visualized for any of the genes characterized in this paper in liver or placenta. Recently, however, we have isolated a clone from a rat placental cDNA library, and partial sequence analysis reveals identity to rnCGM1, proving this gene to be transcribed in p l a~e n t a .~ As possible explanations for this apparent contradiction, we assume that this gene is either expressed a t a level too low to be visualized in DNA/RNA hybridization analyses or that it is differentially expressed during placental development. Further experiments have to be performed to prove that the other genes are active or represent pseudogenes.
When the nucleotide and derived amino acid sequences of the N-terminal domain exons of the rat CEA-like genes are compared with the corresponding exons of the human genes, a generally low sequence conservation is observed (see Fig.  3). The highest similarity at the amino acid level (54%) is found between NCA and rnCGM4 (Fig. 3). In contrast, most of the human exons and some of the rat exons show a higher degree of sequence conservation within a species (Fig. 3). For this reason, it is not possible to assign individual genes to their counterpart in the other species. In a further study, Southern analyses of human and rat genomic DNA probed with various exon-containing fragments indicate that the sequences of all members of the rat CEA-like gene family show strong divergence to their counterparts in man? This may create problems in finding analogous genes in the two species by sequence comparison alone.
The N-terminal domain exons identified so far in the rat, reveal a maximal similarity of 76% at the amino acid level (rnCGMl/rnCGM3), which is much lower than the similarity among closely related CEA-like genes in man (up to 92%; Fig.  3). These data and the lack of strongly hybridizing genomic DNA fragments in the rat with rat exon-containing probes5 argue against the existence of subgroups with closely related members in this species as has been found in the human CEA gene family.3 Simple repeated DNA sequences, which might serve as hot spots of recombination (Cohen et al., 1982) or trigger gene conversion , are found upstream of the N-terminal domain exons of rnCGMl and rnCGM3 (Fig. 2)

CEA Gene Family in Rat 6911
They consist of homopurine/homopyrimidine stretches with d(CCTT) and d(GGA) units, which are repeated to a different extent in the two genes. This type of sequence is quite abundant in the mammalian genome and is found in association with various genes (Muller et al., 1987;Kelly and Trowsdale, 1985;van den Heuvel et al., 1985;Delaey et al., 1987;Cohen et al., 1982). Deviations from the basic motif of the simple sequences in the rat CEA-like genes are mainly caused by transitions, so that the homogeneity of these homopurine/ homopyrimidine stretches is not disrupted. This might be important for the still hypothetical function of such sequences, which are known to adopt an open conformation under superhelical stress, as assayed by their S1 nuclease hypersensitivity (Delaey et al., 1987). The open conformation may allow easy access for factors involved in control of transcription or recombination processes. The simple DNA sequences, present in the rat CEA-like genes might, therefore, have been involved in the conservation of certain parts of these genes by gene conversion or formation of multiple CEAlike genes by unequal crossing over. The latter possible function is supported by the observed length heterogeneity of the simple sequences (Fig. 2) caused by imperfect alignment of the originally identical sequences and the presence of a 25base pair direct repeat in the (GGA), sequence in rnCGM3 (Fig. 2). Alternatively, slippage of DNA polymerase during replication of the simple repeated sequences could also account for this heterogeneity or formation of the direct repeat (Efstratiadis et al., 1980). Alignment of the deduced amino acid sequences of the five N-terminal domains and partial leader sequences of the rat and a comparison with the corresponding human CEA sequences, reveals strong conservation of certain amino acids. Among them are the first (serine) and the last (alanine) amino acids of the leader fragment, which are absolutely conserved in all rat (Fig. 4) and all human CEA-like antigen leader sequences analyzed so far (Thompson et al., 1987;Beauchemin et al., 1987;Watanabe and Chou, 1988a). These residues, therefore, could be an essential part of a putative signal peptidase recognition site (Perlman and Halvorson, 1983). In the N-terminal domain, two regions, one in the N-terminal and one in the C-terminal half, can be identified, where most of the conserved or conservatively exchanged amino acids are clustered. The latter region is flanked by two segments with low sequence similarity (Fig. 4). Recently, Williams (1987) suggested that the CEA repeat halves (Thompson and Zimmerman, 1988) reveal a close structural similarity to the constant domains of the immunoglobulins, whereas the Nterminal domain of CEA is more related to the variable domains. The latter suggestion is strengthened by the observation that 8 out of 13 amino acids, which are highly conserved in the variable domains of nearly all members of the immunoglobulin superfamily (Williams, 1987), are present in the N-terminal domain of all CEA-like antigens (Fig. 4). These critical amino acids probably play a key role in formation of the characteristic immunoglobulin fold, which in the case of the variable domain is composed of eight to nine @-strands (Williams, 1987). P-Strands very similar in number and length can also be predicted for the N-terminal domains of all CEAlike antigens of the rat, including the additional C' strand, characteristic for the variable domain of the immunoglobulins (only shown for rnCGM1; Fig. 4). This C' strand and two additional regions in the variable domain of immunoglobulin light chains form the antigen combining site or complementarity determining regions (Kabat et al., 1987;Fig. 4). In the N-terminal domain of the CEA-like molecules, two hypervariable regions (HVR1 and HVR2) are found to completely overlap two of the three complementarity determining regions of immunoglobulin light chain variable domains (Fig. 4). In the immunoglobulins, the @-strands form two @-sheets, which are held together by a disulfide bond and a salt bridge between an arginine and aspartic acid. The latter two residues are also absolutely conserved in all members of the CEA gene family so far analyzed in rat. The ionic bond between arginine and aspartic acid is probably sufficient to stabilize the threedimensional structure of the variable domain in immunoglobulins in the absence of the disulfide bridge (Williams, 1987). Taking these results together, the N-terminal domain of the CEA-like antigens is probably similarly folded to the variable domains of the immunoglobulins. This feature of the various members of the CEA family may indicate, in analogy to the immunoglobulin superfamily, a recognition or a receptor function for the CEA-like antigens, each conveying a separate function.