Characterization of Squid Crystallin Genes COMPARISON WITH MAMMALIAN GLUTATHIONE S-TRANSFERASE

Previous experiments have indicated that the crys- tallins of the squid lens (S-crystallins) are evolution-arily related to glutathione S-transferases (GST) (EC 2.6.1.18). Here we confirm by peptide sequencing that the crystallins of the lens of the squid Ommastrephes sloani pacificus comprise a family of GST-like pro- teins. Squid lens extracts showed 400 times less GST activity than those of liver using l-chloro-2,4-dinitro- benzene as a substrate, suggesting that the abundant GST-like crystallins lack enzymatic activity. Four dif- ferent cDNAs (pSLZ0-1, pSLl8, pSL11, and pSL4) showed 2045% similarity in homologous regions with mammalian GST polypeptides. pSLZ0-1, pSL18, and pSL4 each encode an S-crystallin with a unique internal peptide that is unrelated to mammalian GSTs or any other sequence in GenBank. The S-crystallin fam- ily is encoded in a minimum of 9-10 genes, and the exon-intron structures of at least two of these (SLZO-1 and SL11) are similar to those of the mammalian GST genes. The SLZO-1 gene has six exons, with the its unique internal peptide encoded precisely in exon 4; the S L l l gene lacks a unique internal peptide and has five exons. Experiments using bacterial chloram-phenicol acetyltransferase as a reporter gene showed that at least 84 and 111 base pairs of 5”flanking

Previous experiments have indicated that the crystallins of the squid lens (S-crystallins) are evolutionarily related to glutathione S-transferases (GST) (EC 2.6.1.18). Here we confirm by peptide sequencing that the crystallins of the lens of the squid Ommastrephes sloani pacificus comprise a family of GST-like proteins. Squid lens extracts showed 400 times less GST activity than those of liver using l-chloro-2,4-dinitrobenzene as a substrate, suggesting that the abundant GST-like crystallins lack enzymatic activity. Four different cDNAs (pSLZ0-1, pSLl8, pSL11, and pSL4) showed 2 0 4 5 % similarity in homologous regions with mammalian GST polypeptides. pSLZ0-1, pSL18, and pSL4 each encode an S-crystallin with a unique internal peptide that is unrelated to mammalian GSTs or any other sequence in GenBank. The S-crystallin family is encoded in a minimum of 9-10 genes, and the exon-intron structures of at least two of these (SLZO-1 and SL11) are similar to those of the mammalian GST genes. The SLZO-1 gene has six exons, with the its unique internal peptide encoded precisely in exon 4; the S L l l gene lacks a unique internal peptide and has five exons. Experiments using bacterial chloramphenicol acetyltransferase as a reporter gene showed that at least 84 and 111 base pairs of 5"flanking sequence are needed for function of the SLZO-1 and S L l l promoters, respectively, in a transfected rabbit lens epithelial cell line (N/N1003A). Within these regions each has a putative TATA box and an upstream AP-1 site overlapping with antioxidant responsivelike elements, which are regulatory elements in the rat GST Ya and quinone reductase genes responsive to oxidative stress.
Crystallins are a surprisingly diverse group of proteins comprising 80-90% of the soluble protein of the transparent eye lens (1-5). All vertebrates have a-, 8-, and y-crystallins, with the proportional amounts of each depending upon the species. There are also a number of crystallins which are present only in certain taxonomic groups or selected species of vertebrates. These taxon-specific crystallins are particularly fascinating in that they have been recruited from metabolic enzymes and are thus known as enzyme-crystallins (6, * The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

The nucleotide sequence(s) reported in thispaper has been submitted to the GenBankTM/EMBL Data Bank with accession number(s) M74315-M74327.
$ Present address: N. K.Koltzov Institute of Developmental Biology, USSR Academy of Sciences, Moscow, USSR. 7). In some cases enzyme-crystallins retain their catalytic activity when expressed in the lens as crystallins, while in other cases they appear metabolically inert in the lens.
The complex eyes of some invertebrates also contain transparent cellular lenses (8,9). Cephalopods (10) and cubomedusian jellyfish (11) are two striking examples of convergent evolution leading to eyes and lenses with resemblances to those of vertebrates. In contrast to the extensive investigations which have been performed on the vertebrate crystallins, very few studies have been conducted on invertebrate crystallins. The major cephalopod and cubomedusian jellyfish crystallins differ from each other and from vertebrate crystallins (11)(12)(13)(14)(15). The octopus lens, however, has one less abundant crystallin (called Q-crystallin) (16), which is related if not identical to aldehyde dehydrogenase (17), an enzyme also recruited as a lens crystallin in the mammalian elephant shrew (18). In addition to being directly relevant to understanding the basis of lens transparency, further knowledge of invertebrate lens crystallins may provide insights into possible similarities and differences during the convergent evolution of the eye lens of vertebrates and invertebrates, possibly leading to a greater understanding of the evolution of new organs in general.
Previous studies have shown that the squid lens contains one major family of soluble crystallins (called S-crystallins) with polypeptide molecular masses of 27-36 kDa (15,19) which are clearly related to the family of glutathione Stransferases (GST)' (20,21). Here we show by peptide analysis and by cDNA and genomic DNA sequencing that squid Scrystallins comprise a family of GST-like polypeptides encoded by a minimum 9-10 S-crystallin genes. We demonstrate that four analyzed S-crystallin genes are expressed predominantly, if not exclusively, in the lens and most probably do not code for an active squid GST. We characterize the exonintron structure of two S-crystallin genes (SL20-1 and SLll), compare them with the exon-intron structures of vertebrate GST genes, and identify promoters for both squid genes which function in a transfected rabbit lens epithelial cell line.

MATERIALS AND METHODS
Obtaining Squid-Ommastrephes sloani pacificus was collected at the Union of Soviet Socialist Republics Pacific shore (Vladivostok) and hligo opalescens was collected at the Hopkins Marine Station (Pacific Grove, CA). The lenses and specified organs were removed from the squids, stored in liquid nitrogen, and brought to the National Institutes of Health, Bethesda, MD on dry ice, where the present experiments were conducted.
of 10 mM Tris-HC1, pH 7.5, and 1 mM EDTA, stirred for 30 min, and Preparation of Crystallins-Lenses were homogenized in 5 volumes centrifuged for 15 min in an Eppendorf 5415 centrifuge at maximum speed, all at 4 "C. The supernatant fraction was used for SDS-

Squid Crystallin Genes 8605
polyacrylamide gel electrophoresis in a 12.5% running gel. After electrophoresis the proteins were either stained with Coomassie Blue or electroblotted onto a nitrocellulose filter and stained with Ponceau S. Partial amino acid sequences were obtained from specific bands on the filter, as given below. Partial Sequencing of Lens Crystallins-Proteins present in the electrophoretically derived bands on the filters were eluted and digested with trypsin. The resultant tryptic peptides were separated by high performance liquid chromatography and amino acid sequencing was performed as a service by Dr. W. Lane at Microchemistry Facility, Biological Laboratories, Harvard University and the Cambridge Prochem (Cambridge, MA) (see Ref. 11).
GST Assay-Lenses and digestive glands were homogenized in 10 volumes of 0.1 M potassium phosphate, pH 7.0, and centrifuged for 15 min in an Eppendorf 5415 centrifuge at maximum speed at 4 "C. The supernatant fractions were assayed for GST activity with 1chloro-2,4-dinitrobenzene as a substrate, as described (22,23).
Isolation of Nucleic Acids-High molecular weight DNA was isolated from squid testis by standard procedures (24). Phage and plasmid DNAs were isolated using Qiagen columns (QIAGEN Inc., Studio City, CA). For transfections, plasmid DNAs were purified by the alkaline method (25) and subsequently banded twice in cesium chloride and ethidium bromide. RNA was isolated by the acidic guanidinium thiocyanate-phenol-chloroform extraction method (RNazol B, Cinna/Biotecx, Friendswood, TX) (26) from different squid organs, ethanol precipitated, and stored at -70 "C.
Northern Blot Analysis-Total RNA from the specified organs of the two species of squid were size-fractionated on 1.5% agarose, 2.2 M formaldehyde gels and transferred to Nitran nylon filters as recommended by the manufacturers (Schleicher & Schuell), cross-linked by ultraviolet irradiation using Stratalinker (Strategene, La Jolla, CA), and hybridized to the indicated cDNA inserts. EcoRI-Hind111 restriction fragments of cDNA inserts were gel-purified using Geneclean (BIO 101, La Jolla, CA) and labeled with [a-"'PIdCTP (3000 Ci/mmol, Amersham Corp.) using the random primer method (Bethesda Research Laboratories kit) to a specific activity near lo9 cpm/ pg DNA. Hybridization occurred overnight at 42 "C in 40% formamide, 6 X SSC (standard saline citrate (0.15 M NaC1, 0.015 M sodium citrate)), 5 X Denhardt's solution, 10% dextran sulfate, 5 mM EDTA, 10 mM sodium phosphate, pH 6.8, 0.1% SDS, 100 pg/ml fragmented denaturated salmon sperm DNA, and 1-2 X lo6 cpm/ml of cDNA probe. Filters were washed twice in 2 X SSC and 0.1% SDS at room temperature for 30 min, twice in 1 X SSC and 0.1% SDS at 45 "C for 40 min, and twice in 0.1 X SSC and 0.1% SDS at 45 "C for 40 min.
FIG. 1. Comparison of amino acid sequences of the squid crystallins encoded by pSL20-1, pSL11, pSL18, and pSL4 cDNAs with the rat GST Ya (40) and Yp (41) subunits. Invariant residues (60) are boxed. Dashes show the gaps which were introduced to maximize similarity. Positions of introns in the SL20-1, SL11, rat Ya (61), and rat Yp (62) genes are shown by arrows. Numbering of the crystallin encoded in pSL4 assumes that its NH,terminal part has the same length as in other squid crystallins (see text). Central inserts in the polypeptides encoded by pSL20-1, pSL18, and pSL4 cDNAs are shaded. Repeats in pSL4 insert are boxed.  In some cases where the background remained high, the filters were reautoradiographed after additional washes in 0.1 X SSC and 0.1% SDS at 60 "C for 40 min.

D P G G T D E~V P C H S D D H~D B P C T D D S C Q A E D R R G G H S D S H R~D I S S E E S A S~~
Southern Blot Analysis-Restricted genomic or recombinant phage DNAs were fractionated on 0.8% agarose gels in 0.04 M Tris-acetate, pH 7.2, 1 mM EDTA and transferred to nitrocellulose or Nitran nylon filters as recommended by the manufacturers (Schleicher & Schuell). Labeling of EcoRI-Hind111 fragments of cDNA inserts and the conditions of hybridization and washing were identical to those used for the Northern blot analysis, except that all washes in 1 X SSC and 0.1 X SSC were conducted at 60 "C for 40 min. Oligonucleotide probes were radioactively labeled with [y-"PIATP (7000 Ci/mmol, INC, Irvine, CA) using T4 polynucleotide kinase (Bethesda Research Laboratories). Hybridization was for 2-3 h or overnight a t 50 "C in 6 X SSC, 0.01 M sodium phosphate, pH 6.8, 1 mM EDTA, pH 8.0, and 0.5% SDS. Filters were washed four times for 5 min each time in 6 X SSC and 0.5% SDS at room temperature followed by washing for 1 min at 50 "C in the same solution.
Construction and Screening of the Squid Genomic Library-Squid genomic DNA was partially digested with MboI and approximately 8-21-kilobase pair fragments were isolated by sucrose gradient centrifugation and cloned in the EMBL-3 bacteriophage vector (27); 2 X lo6 independent plaques were obtained. The library was constructed and amplified as a service by Clonetech (Palo Alto, CA). About 5 X loK plaques were screened initially with the "P-labeled EcoRI-Hind111 restriction fragment of the pSL20-1 cDNA insert and subsequently with a mixture of 32P-labeled EcoRI-Hind111 fragments derived from the pSL11, pSL18, and pSL4 cDNAs. The hybridization and washing conditions were the same as those used for the Southern blot analysis, with the exception that dextran sulfate was omitted from the hybridization solution.
DNA Sequencing-The cDNA insert of pSL4 was sequenced by a combination of the Maxam-Gilbert technique (28) and the dideoxynucleotide termination method (29) using Sequenase Version 2.0 (U. S. Biochemical, Cleveland, OH) and ["SIdATP (1000 Ci/mmol, Amersham Corp.). The genomic clones were sequenced in both directions by the dideoxynucleotide termination method after recloning the BglII restriction fragments of recombinant phages in the BamHI site of the Bluescript SK(-) plasmid (Stratagene, La Jolla, CA). Synthetic oligonucleotides were used as sequencing primers. Reaction products were resolved on 5,6, and 8% polyacrylamide, 7 M urea gels. DNA sequences were analyzed with the IDEAS software package (30).
Estimation of Intron Size by PCR-The length of some introns in the SL20-1 and S L l l genes was estimated by PCR (31). Samples underwent 30 amplification cycles of denaturation a t 94 "C for 1 min, followed by annealing at 55 "C for 2 min, and extension at 72 "C for 3 min in an automated thermocycler (Perkin-Elmer-Cetus). Two different sets of primers were used to estimate the length of each intron. Recombinant phage DNAs were used as templates.
Squid Promoter-CAT Constructions-Different lengths of 5"flanking sequence of the SL20-1 and S L l l genes were generated by five cycles of PCR as above using primers leaving a 5' Sal1 site and a 3' HindIII site on the amplified fragments for cloning. Amplifications were performed using 0.1 pg of plasmid containing the 5"flanking sequence and part of the first exon (positions -897/+125 for SL20-1 and -1592/+141 for S L l l genes) as original templates. After digestion with HindIII and Sa& the PCR products were purified by Geneclean (BIO 101, La Jolla, CA) or acrylamide gel electrophoresis (28), cloned in the pSVOATCAT vector (32), and sequenced to verify identity with the original sequences. lines of lens (N/N1003A) (33) or kidney (RK13) (34) derived from Transfections-Transfections were performed in epithelial cell rabbits. A minimum of three separate experiments performed with duplicate samples was done for each construct tested. Test plasmids (10 pg) and SV40 promoter/P-galactosidase plasmid pCHllO (35) (2 pg) were cotransfected as calcium phosphate precipitates with modifications described elsewhere (36). Cells were assayed for CAT activity by the fluor-diffusion method (37) and for p-galactosidase activity (38) 48 h after transfection. CAT activity was assayed a t several time points, and the average was calculated from the linear range of values. Primer Extension Analysis-Primer extension analysis was performed with 5 pg of total RNA from the squid lens. Primers complementary to the coding region of SL20-1 (positions +145 to +164) and S L l l (positions +161 to + B O ) (see Fig. 8) were used. Labeled primers were mixed with RNA, heated at 68 "C for 5 min, and incubated at 37 "C for 30 min to 1 h after addition of 10-20 units of avian myeloblastosis virus-reverse transcriptase (Stratagene, La Jolla, CA). RNA was hydrolyzed by NaOH treatment, and extended cDNAs were extracted with phenol-chloroform, and ethanol precipitated. cDNAs were analyzed on 6% polyacrylamide, 7 M urea gels. Sequencing reaction ladders were used as size markers.

RESULTS AND DISCUSSION
Multiple cDNAs Encoding Squid Crystallins-Previously, several cDNAs with GST-like sequences derived from at least three genes were isolated from a squid lens cDNA library (20,39). Here ( Fig. 1) we compare the sequence of the deduced proteins of another newly isolated squid lens cDNA, pSL4, with those of pSL11, pSL20-1 (originally called pSL20) and pSL18. Fig. 1 also compares these four deduced squid crystallins with the rat Ya (40) and Yp (41) GST subunits, which they most resemble.
The deduced polypeptide molecular masses encoded in these cDNAs are approximately 24 kDa for pSL11, 26 kDa for pSL20-1, 36 kDa for pSL18, and 34 kDa for pSL4. pSL4 is incomplete since it lacks an initiating codon near its 5' end and stays in frame from the beginning. If one assumes that the full-length NHz-terminal sequence of the endogenous protein partially encoded by pSL4 is the same as that in the other crystallins of this family, the molecular mass of the complete polypeptide would be about 45 kDa. The alignment in Fig. 1 shows, as noted earlier (20), considerable conservation of the NH2 and COOH termini of the squid crystallins with each other and with the rat GST subunits (except that the Ya and Yp polypeptides are 16 and 8 residues longer at their COOH termini than the squid polypeptides, respectively). Although the pSLll polypeptide can be aligned with the rat Ya and Yp polypeptides along their entire sequences, the pSL20-1, pSL18, and pSL4 -polypeptides have unique inserts in their central regions (shaded in Fig. 1). Insert sizes for the different crystallins are 17 residues for pSL20-1, 103 residues for pSL18, and 193 residues for pSL4. These inserts are not homologous to the rat GST polypeptides or to any other sequence in GenBank. The insert of pSL4 S-crystallin contained four complete and two incomplete repeats of 10 amino acids with consensus RGDGGYXVQG/S. The NH2terminal part of this repeat, RGD, present four times in the insert, was found previously in the variety of cell adhesive proteins like fibronectin (42) and is believed to be a part of   (43). Together, these data indicate a minimum of four different GST-like genes encoding the squid lens crystallins. Peptide Evidence for a Family of GST-like Squid Lens Crystallins-In order to substantiate directly the presence of a family of related crystallin polypeptides with differing sizes in the squid lens and to attempt identifying the polypeptides encoded in the cDNAs described above, the soluble proteins of the squid lens were subjected to SDS-polyacrylamide gel electrophoresis, electroblotted onto nitrocellulose, eluted, digested with trypsin, and some chromatographically isolated peptides sequenced. The electrophoretic patterns of two concentrations of squid crystallins show a group of major bands with a molecular mass of about 25 kDa and several minor polypeptides with higher molecular masses (Fig. 2). Three to four tryptic peptides from each of four sizes of electroblotted proteins (cuts A-D in Fig. 2) were sequenced and compared with amino acid sequences deduced from the cDNAs described above encoding proteins with molecular masses consistent with those obtained from the polyacrylamide gel. Only two of the tryptic peptides matched perfectly with a region encoded in the cDNAs; these were NT67(2) and NT21 which were both encoded in pSL20-1. The other peptide sequences had a maximum of 50-8796 identity with the deduced sequences found in the cDNAs, with the variations generally involving conservative amino acid changes (not shown). These results suggest the existence of a family of GST-like squid crystallins comprising more than the four members presented in Fig. 1.
Since the major 24-27-kDa squid crystallins as well as less abundant higher molecular mass crystallins (about 32-34 kDa) are all related to mammalian GST and to each other, we conclude that the squid lens consists essentially of one class of GST-like polypeptides. Our recent studies on octopus lens crystallins gave similar but not identical results: besides the major GST-like S-crystallins there is another less abundant adehyde dehydrogenase-like crystallin (Q-crystallin) in octopus lens (17).
Enzyme Activity Tests for Squid Crystallins-In view of the relationship of the sqpid S-crystallins with GST, extracts of the squid lens were assayed for enzymatic activity and compared with comparable extracts from the squid digestive gland (functional analog of vertebrate liver) and from mammalian lenses using l-chloro-2,4-dinitrobenzene as a substrate. Recently, GST with very high specific activity using l-chloro-2,4-dinitrobenzene was isolated and partially characterized from the digestive gland of the squid Loligo vulgaris (44). Our results showed that the squid lens extract had 400 times less GST activity than the digestive gland extract (0.26 f 0.02 uersus 106.3 & 1.0 OD/min/mg protein, respectively). pSL11 pSL20-1 pSLl8 pSL4 This is similar to previous results obtained with the octopus (17). It is also noteworthy that the squid crystallins do not bind to a glutathione-agarose as does mammalian and squid GST (44). These data are consistent with a specialized structural rather than enzymatic role in the lens for cephalopod S-crystallins, although we cannot exclude the possibility that S-crystallins have some enzyme activity when tested with another of the possible substrates for GST (see Ref. 45).

E H B E H B E H B E H B
Tissue Distribution of Squid Crystallin mRNAs-We next examined by Northern blotting the tissue distribution of ' J. Horwitz, unpublished observations. RNAs which hybridize to the four cloned cDNAs (Fig. 3). Total RNA was used from lens, digestive gland, testis, and muscles of the homologous squid, 0. pacificus, and from lens, gills, heart, ovary, giant axons, and brain of the heterologous squid, L. opalescens. The Northern blots showed intense hybridization of the cDNAs to both the homologous and heterologous lens RNAs. Although not evident from Fig. 3, the strongest hybridization was with pSL20-1, consistent with it being one of the most prevalent crystallin mRNAs; in addition, this RNA appears highly conserved between the two species of squid being examined. The pSL11, pSL18, and pSL4 cDNAs also cross-hybridized with the heterologous lens RNAs; however, in each case the patterns were different in 0. pacificus and L. opalescens (see Fig. 3).
The squid crystallin cDNAs hybridized either more strongly (pSL11 and pSL4) or exclusively (pSL20-1 and pSL18) to the RNA from the lens than from the other tissues tested. Multiple hybridization bands obtained with pSL4 cDNA may be due to repetitive sequence in its central part (see previous section). It is unlikely that any RNAs were seriously degraded because the control octopus &-tubulin RNA hybridized as a single band of approximately 2 kilobases (Fig. 3). Further experiments are necessary in order to determine whether the hybridizations of pSL4 and pSL11 cDNAs in non-lens tissues represent expression of these genes.
Squid Crystallin Genes-In order to begin characterizing the squid crystallin genes, Southern blots of genomic DNA digested with EcoRI, HindIII, or BamHI were hybridized with EcoRI-Hind111 fragments isolated from pSL11, pSL20-1, pSL18, or pSL4 (Fig. 4). These probes lacked 243 bp (SLll), 70 bp (sL18), and 160 bp (SL20-1) of the 5' ends of their cDNAs. The hybridization patterns obtained confirmed that each cDNA recognized specifically a different set of genomic sequences and suggested that some of them (especially pSL20-1) may be hybridizing to more than one gene or pseudogene, although we could not rule out allelic polymorphisms or the presence of restriction sites within introns dividing genes into pieces from this data alone.
The genomic library was next screened with a mixture of and pSL4 cDNAs. After hybridization of the same filters used for screening with pSL20-1 probe, two clones hybridizing to the pSLll probe and two hybridizing to the pSL4 probe were obtained. Although the two S L l l and the two SL4 genomic sequences each gave different patterns of BarnHI or BglII bands hybridizing to their respective probes (data not shown), further analysis by hybridization with oligonucleotides corresponding to parts of the cDNAs and sequencing indicated that each pair comprised overlapping fragments of a single gene.
We conclude that S-crystallins in the squid are encoded by a gene family consisting of at least 9-10 members. pSL20-1 cDNA belongs to a subfamily of genes which are more similar to each other than to the pSL4, pSL11, and pSL18 cDNAs.
The SL20 subfamily appears to comprise the major S-crystallins in the lens.
Characterization of the SL20-1 and SLll Genes-The BgZII restriction fragments from the SL20-1 and S L l l genes were cloned into the BamHI site of the Bluescript plasmid and sequenced. The lengths of the SL20-1 and S L l l genes are approximately 13.5 and 17 kilobase pairs, respectively. The exon-intron junctions were determined by comparing the gene and cDNA sequences (Fig. 5). It is interesting to note that all intron sequences near the exons (with the exception of intron 2 in SL11) contained more than 50% T residues. Since other squid genomic sequences have not been characterized, it is not known at present whether this is a common feature of squid introns or if it is a specific characteristic of this family of squid genes. The size of each intron was estimated by a combination of direct sequencing, restriction mapping, and PCR. The SL20-1 gene contains six exons, while the S L l l gene has five exons. The central peptide in the deduced protein of the pSL20-1 cDNA, which is neither present in the pSLll cDNA nor has a homologue in the rat GST subunits, is encoded precisely in exon 4 of the SL20-1 gene. Moreover, partial sequencing showed that a 14-amino acid central peptide in SL20-4, which is not represented in the rat GST proteins, is also encoded in a separate exon (data not shown). The exon-intron boundaries of the SL20-1 and S L l l genes coincided identically with respect to the homologous amino acid sequences that they encode (Fig. 1). Indeed, they were both very similar to those present in the rat GST genes, with greatest similarity occurring with the Yp gene. The position of intron 1 in squid S-crystallin genes coincides exactly with the position of intron 2 in rat GST Ya gene, while the position of all other introns (with the exception of intron 4 in SL20-1 S-crvstallin gene) are identical in sauid S-cwstallin genes and the EcoRI-Hind111 probes derived from the pSL11, pSL18, rat GST Yp gene. The additional exon 4 inthe squG 3 2 0 -1

A G A T C T T G A T C G T G~T T T A G N L G C l T T T T T G A G T G G A C C T A T -801 T T T P T T I T G C A A G C A T A A C A C G C T T C G T A T I T T G T G T A -701
n;TCACGGGCGCTMTITTGACGAGAAATTATTGGAAAATAGAACAGATATTTTAATAACTCTCAAAG~GCTCTTTCA~AAAGATGGCCGTGGTCTAA~GA ATPGTATTACGGGTGTAAGCCATACGTAGTGGCTGCAGTGTGAAACATGG~TPATPACAGAAAATGCmATG~AAAAGTGGATGG~CCATCAACA -501 CTTTCAATGTACTPGCCACTGAAATCAAAI\TAGCATGCTTTTGTPTACAA~TGTCGTACCTPTCCCMGGACATPAG~C~TC~CACTTAGCTG -401

C T A T A C A C C T G A C C G C T C T G C A T C T C A G T G T C A C G~C C A~G A C A A~~C G G T G C A A T G T C A T G A~C C T G T G T C A~~C A C G T A A
-312 -301

T G T G C C T T T T A A A A T T A C T C A G A C A T T A A C A C P G T G C A C C C -201
TTTCCCAACTGTTTCGAGTTGTGAGTITTGGn;CAATCTAAC~GGAACAGGTGAATTAATPCTITTGGMGCCAGMTAATITGT~TGGCTTAGT~~ -761 'c -601 .

C T A G T A C C T T T~A G T C A A C A T A C T T T T C T A G A C A C~C T A G A C A T A T P T T G A T G C~T C T A G G C T T A A A C T~T~C T G~T A T C C f f i G A T
-1301

M C G C T A G T G A A G T C A T M G T A G G A T T T A T G T T~G A C~A A C T~A T A T G A C T T C A G A T P C C~G~G G T P G~C A T I T T G A~G T A~C A T A T -1101
ATATCATCATCATCATCATCATCATCATCATCATCATCATCAGC~GTAAGTC~CPGCPGAACGTAffiC~CCG~TPCTTTCCATGACTGCCTA -1001 TCCTGCACTATTCTTPGCCTC~TTTITCTATPTCATCTCT~TCCATTGT~CCTTTPCTTACTTCGCTTATCACCATATGGACACCAATCAGT -901

A T C G C T G T C P G T T C C C C C G A G T T G A T P~C T A A T T T A C A
-201 4 .

T T T T P~A C T P C T A C T I T f f G~A A T T I T T G A C T A T G G C C G T G G C A G~C A T T P A T C P G T T A~A G~C T A G T A A~~G T G C C A A~A C G A~
-301

1+ I
A" and SL20-4 genes encoding the internal peptide that is not represented in mammalian GST proteins was probably inserted after separation of these evolutionary pathways.

C C A T C A A A G A C T T C T T C m G T M G A A A A C A A C A G e P C T C A C C T
Although the expressed sequences of the SL20-1 and S L l l genes were essentially the same as their cDNA counterparts, several differences in individual base pairs were noted (Fig.  6). Nine of the 11 differences in the SL20-1 gene are present in the protein coding sequence, with two being found in the 3"nontranslated region, and eight of the 13 differences in the S L l l gene are found in the coding sequence, with one present in the 5' and four in the 3"nontranslated sequence. Since most of these differences are located in the coding region and none of them lead to the changes in the amino acid sequences we presume that they represent polymorphic positions. It is possible, of course, that the cDNAs were derived from differ-. ent genes than the ones we analyze here.
Primer extension experiments were performed to determine the transcription initiation sites and allow the identification of the 5"flanking sequences associated with each gene. The extended cDNAs indicated that the SL20-1 gene has two and the S L l l gene three putative initiation sites (Fig. 7). The abundance of extended SL20-1 cDNAs was severalfold greater than that of S L l l cDNAs, supporting the idea that the SL20-1 gene is expressed more highly than the SLll gene in the lens. We cannot definitively exclude the possibility that the multiple extended cDNAs were derived from different cross-hybridizing mRNAs; however, the similarity in the length and amounts of the two extended SL20-1 cDNAs and of the three S L l l cDNAs are consistent with the interpreta-tion that they represent multiple transcription initiation sites of their respective genes.
The 5"flanking sequences of the SL20-1 a n d S L l l genes show considerable differences although both are rich in T (Fig. 8, A and B ) . Both A and B ) . Interestingly, a 13-bp sequence (5'-GACTGCATAGATA-3') immediately downstream of the TATA box in SL20-1 gene has only 1 bp mismatch with a sequence at positions -35/-23 in the mouse aB-crystallin gene (36). Both squid genes have an AP-1 consensus sequence (underlined in A and B ) 15 bp (SL20-1) and 45 bp (SL11) upstream of the designated TATA box. The promoter sequence of another squid S-crystallin gene, SL20-3, also contains an AP-1 consensus sequence 15 bp upstream of TATA box (data not shown). Many crystallin genes of vertebrates also contain an AP-1 site in their 5'-flanking sequence (46), as do functional 5' regulatory sequences of the human x gene (47) and rat Yp gene (48) of GST. Mutagenesis of an AP-1-like site in the chicken @B1-crystallin promoter greatly reduces its activity in transfected embryonic chicken lens epithelial cells and its ability to bind putative transcriptional factor in chicken lens nuclear extracts (49); moreover, deletion or site-directed mutagenesis of these sites in the rat (48,50) or human (47) GST genes also reduced their function.
The AP-1 site overlaps with a sequence similar to a regulatory cis-element called the antioxidant responsive element (GTGACNNNGC) in the SL20-1, SL20-3, a n d S L l l genes and, in addition, with a sequence similar to the xenobiotic responsive element (TNGCGTG) in the S L l l gene. Both the antioxidant responsive element and xenobiotic responsive element are involved in the induction of rat GST Ya and quinone reductase gene expression (51)(52)(53)(54). An antioxidant responsive element-like element is also present in the 5'flanking region of the guinea pig {-crystallin gene,3 which has quinone reductase activity (55). Since oxidative insult is a cause of cataract in vertebrate lenses (56), use of a cis-element responsive to oxidative stress could be envisioned to provide a selective mechanism for recruiting protective enzymes as lens crystallins.
Other interesting features included in these 5"flanking sequences are a CAAT sequence a t positions -94/-91 associated with the SL20-1 gene (Fig. 8 A ) and a stretch of repeating CAT sequences at positions -1096/-1061, an alternating TA sequence at positions -6OO/-573, and a markedly T-rich region a t positions -208/-163 associated with the S L l l gene. Similar strings of T residues can be found in the 5"flanking regions of the chicken PA3/Al-crystallin4 and aenolase/.r-crystallin (57) genes.
Functional Characterization of the SLZO-1 and SLll Promoters in Transfection Experiments-Since the homologous system for the expression of these S-crystallin genes is not available, we resorted to transfection in vertebrate cells as a first step for the functional characterization of squid promoter sequences. The transfection data were normalized relative to those obtained in parallel chicken paA-CAT transfections (57). The promoterless parent plasmid pSVOATCAT exhibited no detectable background activity.
The results showed that the squid S-crystallin promoters functioned in the N/N1003A cells (Fig. 9). The activity ofthe S L l l promoter was always higher while the activity of SL20-1 promoter was always lower than that of the chicken aA-P. Gonzalez, I., Rodriquez, and T. Borras crystallin promoter. Minimal promoter fragments of -84/+37 (SL20-1) and -111/+89 (SL11) were needed for activity in the transfected lens cells. Further deletions to -38 (SL20-1) and -60 (SL11) abolished promoter function despite the continued presence of the putative TATA boxes. Interestingly, the consensus AP-1 sites are contained in the region of the promoters required for function. Reversing the orientation of the largest promoter of each gene resulted in loss of activity (Fig. 9). All SL20-1 a n d S L l l promoter constructions functioned 15-20 and 3-5 times better, respectively, in the lens cells than the kidney cells (data not shown). Additional transfection experiments indicated that CAT constructs driven by promoters of the mouse major histocompatibility (H2Kk),5 chicken a-actin (58), and chicken @-globin (59) genes also functioned considerably better in rabbit lens N/N1003A cells than in rabbit kidney RK13 cells (data not shown). Thus, while our transfection experiments delimited a functional promoter in the squid genes capable of working in vertebrate lens cells, they do not establish lens preference.