Multiple genes provide the basis for antifreeze protein diversity and dosage in the ocean pout, Macrozoarces americanus.

The ocean pout (Macrozoarces americanus) produces a set of antifreeze proteins that depresses the freezing point of its blood by binding to, and inhibiting the growth of, ice crystals. The amino acid sequences of all the major components of the ocean pout antifreeze proteins, including the immunologically distinct QAE component, have been derived by Edman degradation. In addition, sequences of several minor components were deduced from DNA sequencing of cDNA and genomic clones. Fifty percent of the amino acids are perfectly conserved in all these proteins as well as in two homologous sequences from the distantly related wolffish. Several of the conserved residues are threonines and asparagines, amino acids that have been implicated in ice binding in the structurally unrelated antifreeze protein of the righteye flounders. Aside from minor differences in post-translational modifications, heterogeneity in antifreeze protein components stems from amino acid differences encoded by multiple genes. Based on genomic Southern blots and library cloning statistics there are 150 copies of the 0.7-kilobase-long antifreeze protein gene in the Newfoundland ocean pout, the majority of which are closely linked but irregularly spaced. A more southerly population of ocean pout from New Brunswick in which the circulating antifreeze protein levels are considerably lower has approximately one-quater as many antifreeze protein genes. Thus, there appears to be a correlation between gene dosage and antifreeze protein levels, and hence the ability to survive in ice-laden seawater. Southern blot comparison of the two populations indicates that the differences in gene dosage were not generated by a simple set of deletions/duplications. They are more likely to be the result of differential amplification.

The ocean pout (Macrozoarces americanus) produces a set of antifreeze proteins that depresses the freezing point of its blood by binding to, and inhibiting the growth of, ice crystals. The amino acid sequences of all the major components of the ocean pout antifreeze proteins, including the immunologically distinct QAE component, have been derived by Edman degradation. In addition, sequences of several minor components were deduced from DNA sequencing of cDNA and genomic clones. Fifty percent of the amino acids are perfectly conserved in all these proteins as well as in two homologous sequences from the distantly related wolffish. Several of the conserved residues are threonines and asparagines, amino acids that have been implicated in ice binding in the structurally unrelated antifreeze protein of the righteye flounders. Aside from minor differences in post-translational modifications, heterogeneity in antifreeze protein components stems from amino acid differences encoded by multiple genes. Based on genomic Southern blots and library cloning statistics there are 150 copies of the 0.7-kilobase-long antifreeze protein gene in the Newfoundland ocean pout, the majority of which are closely linked but irregularly spaced. A more southerly population of ocean pout from New Brunswick in which the circulating antifreeze protein levels are considerably lower has approximately one-quarter as many antifreeze protein genes. Thus, there appears to be a correlation between gene dosage and antifreeze protein levels, and hence the ability to survive in ice-laden seawater. Southern blot comparison of the two populations indicates that the differences in gene dosage were not generated by a simple set of deletions/dupli-3 To whom correspondence should be addressed.
cations. They are more likely to be the result of differential amplification.
The Newfoundland ocean pout, Macrozoarces arnericanus, produces a family of at least 10 active antifreeze polypeptides (AFP)l to prevent it from freezing (1). These AFP occur in concentrations of approximately 20-25 mg/ml in the serum during the winter months and are retained at much lower concentrations during the summer (2). Studies from our laboratories (1) have shown that the ocean pout AFP (type 111) are strikingly different in amino acid composition from the alanine-rich a-helical type I AFP isolated from winter founder (3)(4)(5) and shorthorn sculpin (6, 7), and the cystine-rich type I1 AFP of the sea raven (8). Ocean pout AFP are all in the molecular weight range of 6000-7000 but can be fractionated into five distinct groups based on their behavior on ion exchange chromatography. One group binds to QAE-Sephadex (QAE-1) and four to SP-Sephadex (SP-1 to -4) (1). On reverse phase high performance liquid chromatography (HPLC) the QAE-1 group shows a single peak while each of the SP groups contains several components. Besides the difference in their binding properties on ion exchange chromatography, the QAE-1 group differs from all the SP components in immunological cross-reactivity. Antisera to QAE-1 react poorly to the SP components and vice versa (1).
Recently, the amino acid sequences of the three components in the SP-1 group have been deduced by a combination of protein and cDNA sequencing (9). Whereas the sequence of winter flounder AFP enabled its secondary structure and a mechanism of action to be predicted directly (lo), the ocean pout AFP sequences do not provide any obvious clues to their secondary structure or to how they function as antifreezes. To see which amino acids are most conserved and might, therefore, play a key role in the structure and/or function of the ocean pout antifreezes we have determined the amino acid sequences of QAE-1 and the major components from SP-2, SP-3, and SP-4, and have derived other sequences from cDNA and genomic clones. In addition we have compared the ocean pout sequences to two AFP sequences derived from genomic clones of the wolffish (Anurhichas lupus), a related zoarcid from a different family (11). Approximately 50% of the 65 The abbreviations used are: AFP, antifreeze polypeptide(s); HPLC, high performance liquid chromatography. Samples were analyzed on a Waters p-Bondapak C-18 column (7.8 mm, inner diameter X 30 cm) using a gradient of acetonitrile in 0.1% trifluoroacetic acid with a flow rate of 1 ml/min. a, G-75 AFP; b, QAE-1; c, SP-1; d, SP-2; e, SP-3; f,  amino acids are perfectly conserved between all 13 sequences compared. Five of the conserved residues are aspargine and/ or threonine, amino acids which have been implicated in the binding of winter flounder AFP to ice crystals (10).
Ocean pout from New Brunswick, where winter seawater temperatures are appreciably higher than in Newfoundland, produce the same variety of AFP but in concentrations approximately one-tenth those observed in the Newfoundland specimens (2). To determine the basis for the differential expression of antifreeze we have investigated the extent and organization of the AFP multigene family in the two populations.
The Extent of Sequence Conservation in Ocean Pout AFP-The amino acid sequences of all the major ocean pout AFP components (HPLC-1, -4, -6, -7, -9, -11, and -12) are shown * Portions of this paper (including "Experimental Procedures," part of "Results," Figs. 2 and 4-8, and Tables 1-3) are presented in miniprint at the end of this paper. Miniprint is easily read with the aid of a standard magnifying glass. Full size photocopies are included in the microfilm edition of the Journal that is available from Waverly Press. aligned with the two sequences derived from cDNA cloning, two sequences predicted from genomic cloning (this study), and two from the related wolffish (31) (Fig. 3). The sequences range from 62 to 69 amino acids in length. Most of the length variation occurs at the termini and is due to post-translational processing (9). The overall sequence identity among the 13 components aligned is approximately 50%. Many of the nonidentical residues include sets of conservative substitutions such as isoleucine for methionine or leucine, and valine for alanine. Sequence identity increases to 90% among the ocean pout SP components, and the two wolffish sequences are far more similar to these sequences (85% identical) than ocean pout S P sequences are to the QAE sequences (55% identical).
Four of the perfectly conserved residues (14,18,47, and 55) are threonines or asparagines. In addition, position 8 is occupied by threonine or asparagine, and position 15 by threonine, except in cDNA c10 where this residue is replaced by serine. The potential significance of this observation lies in the fact that the structurally different AFP of the flounder (type I AFP) also has six or seven conserved threonine and asparagine residues (23) that are believed to be responsible for binding to ice crystals (10).
A F P Gene Dosage Differs Markedly between the Newfoundland and New Brunswick Populations-Southern blotting shows New Brunswick ocean pout to have approximately onefifth to one-quarter the number of AFP genes of Newfoundland fish (Fig. 1OA). Even more striking, the patterns of hybridizing fragments are different between the two populations, indicating that one could not be generated from the other by a deletion or a simple set of duplications. Indeed, differences in the banding pattern between the two New Brunswick individuals suggest that the organization of the AFP gene locus might be quite fluid. In contrast, the disposition of genes in the 8-tubulin multigene family is indistinguishable between both individuals and populations (Fig.  1OB). DISCUSSION Size and Organization of the A F P Multigene Family-An estimate of the number of AFP genes in the ocean pout genome can be made on the basis of the frequency with which AFP genomic clones appeared in the library screenings. A total of 20 clones was detected in 1.2 X lo4 recombinant phage screened. Given a haploid genome size for the ocean pout of 1 X lo9 base pairs, an observed average phage insert length of 1.7 x lo4 base pairs, and an average of 1.5 AFP genes/clone, the estimated AFP gene copy number for the Newfoundland fish is 150. This is not an unreasonable figure in light of the number and intensity of hybridization signals seen on the genomic Southern blot (Figs. 6 and 10). Since none of the enzymes used to prepare this blot cut within the genes or cDNAs analyzed, it appears that each band on the genomic blot represents at least one gene. In effect the genes are so short (-0.7 kilobase pair) and so closely linked that several may be present in some of the longer hybridizing restriction fragments. Indeed, based on the maps of XOP 6, 12, 21, and 23, there are genomic EcoRI and BamHI fragments with at least two AFP genes apiece. Also, as mentioned previously several of the hybridizing bands in the genomic blot appear to be comprised of signals from multiple restriction fragments. For example, the maps in Fig. 7 show that XOP 5,12,19, and 23 each contain a 2-kilobase pair HindIII fragment that hybridizes to the cDNA probes and that could contribute to the intense band of hybridization in this size range in the HindIII lane of the genomic Southern blot. These observa-    a a a a a t a t t c a a a a a t g t g a g c t a c a g t a a a a t t c a a c a g t g t t c t g t t t a g a a a g a c a g a g a a c c t t t t a a g t a a a c a t  Based on the analysis of genomic clones the majority of the AFP genes appear to be closely linked. However, they are not regularly spaced, and they lack a repeating pattern of restriction sites around the genes. Thus, in comparing their organization to the arrangements of type I AFP genes found in the winter flounder, they resemble the 10-12 irregular spaced genes rather than the set of 220 genes in regular tandem direct repeats (14).

t t t t a g a a t t t t c t t t t t c a a c t g t g c c a t g a g a a c a a
The dramatic difference in the size of the AFP gene family between the Newfoundland and New Brunswick ocean pout populations can largely account for the order of magnitude difference in their circulating AFP concentrations (2). A similar correlation between gene dosage and AFP levels has been noted before in closely related species of righteye flounders (26). However, in this instance the difference occurs within a single species, which shortens the potential time frame for its development. As pointed out above, a comparison of the patterns of AFP gene hybridization between the Newfoundland and New Brunswick ocean pout (Fig. 10) suggests that one has not been derived from the other by a simple set of deletions/duplications since so many of the gene fragments are of different lengths in the two populations, and even between the two New Brunswick individuals. Instead, the AFP gene locus appears to be subject to extensive rearrangement. We suggest that the periodicity of Cenozoic glaciation in the Northern hemisphere has provided intense but sporadic selection for ocean pout that produce high levels of AFP. This challenge has been met on a number of occasions by AFP gene amplification through some form of disproportionate DNA replication (27,28). Subsequently, during glacial minima, a net loss of AFP genes that can be predicted from the potential for internal recombination (29) would not be selected against until the onset of another glacial episode. Thus, the combination of an AFP gene locus predisposed to amplification and a periodic selection pressure might account for the current geographical and individual variation in the locus.

2.3-2.0-
Functionality of the Genes-It is difficult to say how many of the 150 AFP genes are functional. Protein sequence analysis indicates that of the eight major HPLC components analyzed here and in Ref. 9, only one, HPLC-5, can be produced as a post-translational modification of another. In addition there are several minor components not yet sequenced, such as HPLC-2 and -3, that could potentially correspond to the cDNA sequences of clones 7 and 10 or the genomic sequence in XOP 5. Other minor components flank HPLC-7, -9, and -11. Based on these estimates there are 10-15 different AFP components. However, the number of functional genes is likely to be larger since sequence analysis of cDNA clones 36 and 77 obtained from an individual fish has shown that the same mRNA (HPLC-6) is produced from a t least two genes that differ by silent base changes (9). Some of the genes may even be exact duplicates. To obtain a more accurate figure for the number of functional AFP genes it would be necessary to sequence a large number of them. The two genes (AOP 3 and 5) that have been sequenced both appear to be functional although they do not code for any of the major components. They show extensive DNA sequence conservation around the putative control elements and even in their intervening sequences.
The Type IIIAntifeeze-Through a combination of protein and DNA sequencing a fairly complete picture of ocean pout AFP component heterogeneity has emerged (Fig. 3). Perhaps the most surprising feature is the extensive sequence difference between the SP and QAE components. This difference is greater than might be predicted from a comparison of their amino acid compositions and chromatographic properties but is in line with immunological data (1). Their comparison does, however, serve to identify conserved residues that might be important in ice binding or in the folding of the type I11 AFP in such a way as to present the ice-binding residues in the correct configuration. This information will be especially valuable when the first type I11 AFP tertiary structure is derived by physical methods.

Sciences Research Laboratory, Memorial University of Newfoundland and the Huntsman Marine Laboratory, St. Andrews, New Brunswick for collection of ocean pout tissue, Sherry Gauthier for DNA sequence analysis on XOP 3 and 5, Dr. Don Cleveland for the gift of the chicken @-tubulin cDNA clone, and Angela L'Abb6 for typing the manuscript.
EXPERIMENTAL PROCEDURES Collection of tissue. ocean pout, nacrozoarces americanus. were collected by divers from waters around the AYalon Peninsula. Newfoundland. and from Passaaaquoddy Bay. New Brunsvlck. The flsh were held I" aquaria at ambient temperature and photoperiod prior to sampling ( 2 ) .
Serum w a 5 prepared by aentrlfugation (4,000 x g for 15 m i n ) of clotted blood and stored at -2O'C. Liver and testes were removed Into liauid N q and stored at -6O'C PUrlflcation of Ocean Pout AFP. The procedure for the purlflcation of ocean pout AFP has been described (1.91. Briefly, the serum was initially chromatographed On a Sephadex C -7 5 column (2.5 x 8 6 c m ) .
The thermal hysteresis actlvlty of each fractlon v a s measured With a freerinq point osmometer. Active fractions were pooled and designated as 6-75 AfP. To fractionate the AFP A-25 Sephadex column in 5 mn Tris HC1. pH 9.5. All of the SP into SP and QAE components. G-75 AFP was chromatographed on a QAE coapanents passed through the Column allowing QAE-1 to he eluted On SP-Sephadex of the nonretarded materials from the QAE A-25 by a NaCl gradient a s a ~l n g l e peak. Subsequent chromatography The hamoqenelty of these tractions was examined by reverse phase solumn resolved the fractions lntO 4 groups, SP-l to SP-4 ( 9 ) . ygondapak C-18 column ( 7 . 8 m m x 3 19. Land, H., Grez, M., Mauser, H., Lindenmaier, W.,  Liver poI,y(A)+ RNA was prepared from Newfoundland ocean pout and s12e fractlonated on a sucrose density gradient as described previously ( 9 ) . AfP mRNA sedimenting i n the 10s peak was used as template to synthesize double-Stranded cDNA by the method of land et d. ( 1 9 ) .

HPLC using a Wafers
The =DNA was inserted into the &I rite of pUC-9 by homopolymeric talling with an oliqo(d6) tail being added to the =DNA and an ollqo(dC) The plasmid/cDNA chimeras were transfected into E. & J M 8 3 and tail to the plasrnld.
In this way the pStI site was destroyed.

RESULTS
The amino acld ~0 m p 0~1 t~0 n s of the malor AFP components are Shown in Table 1. Both XPLC-1 and 6 lack arginine while HPLC-12 lacks phenylalanlne. HPLC-12. the only CDmpOnent that hlnds to QAE-Sephddex at pH 9.5. has a distinct Cornp~sitlon compared to the other comnonents.
Nonethelesg. the" a r e verv s l r n l l a r o v e r a l l PrirnarY structure of HPLC-12. Thls sequence was deduced from automated Edman degradation of selected tryptlc. chymotryptlc and CNBr-cleavage peptides, toqether vlth 28 cycles of Edman decrradatlon of the undiaesfed Dentide (Tables 2 and 3

. Cleavaae
Primam structure of HPLC-1.7.9 and 11. Unllke the N-termlnl of qlutaminyl residues. the Other major ATP components were directly HPLC-I. 5 and 6 which were blocked due to the cyCllzation of HPLC-1.7.9. and 11 were unequivocally deduced from 6 5 cycles of accessible to automated protein sequencing.
AFP =DNA clones, The impetus to make a new =DNA library came from the Obseivatlon that =DNA clones PrevIous1y Isolated (9) Were lacking 5eYuence informdtlon at the 3 ' end and Possibly also at the 5' end. The neu llbrary was made by the method of Land Inserts were released from the multiple cloning sfte by digestion with both ECORI and &dl11 and their lengths estimated by agarose gel electrophoresis. On average their length was 100-200 bp lonqer than that of the Inserts from the prev1ou5 library. The Inserfs Of 400 bp and 550 bp respectively, were sequenced (Fig. 5 ) . The from t w o new AFP =DNA clones ( X 7 and 110). with estimated lengths 3 ' end of both cDNAs appears to be intact as ludged by the presence of a complete polyadenylatlon signal (AATIUVL). whlch was lncomplete o r absent from e a r l l e r claner, and a pOly(R) tract downstream.
Clone 110 is actually shorter at the 5' end than clones 116 and 177 (9). but c l o n e $7 has an extensLon of 251 bp beyond the 5' end of clone Y77. clones 117 and 110 both code for SP-type AFP COmPOnents as Opposed to the rarer QAE-type, and both represent minor variants most closely related to HPLC-1 ( F l q . 3 1 .
--Multigene Family in the Ocean Pout thc da;ker s~g n a l s appeared' t imposed.
TO help evaluate Slqndl intensity a duplicate Southern blot was probed WLth a n ocean pout liver cDNA clone unrelated to AFP. This clone. 115, contamed a 450 bp =DNA insert complementary to a n "RNA of 750 nucleotides In length that Was One or two orders of magnitude less abundant in ocean pout liver than the AFP "RNA the RFP cDNR probe. Other detalis of the hybridizatlan. washing 124 TO study the organization of the AFP genes in mare detail. Newfoundland ocean pout genomic DNA war partially digested With -SaulA, size fractionated, and cloned into Charon 10. when 12,000 unampllfied. independent recornbrnant phage were screened were obtalned. Eight of these putative AFP genomic clones were 125) With labelled AFP =DNA clone X36 (9) 20 positive signals selected at random, plaque-purified and prepared an a large scale by banding on CsCl gradients. The phage DNAs from these elght clones I AOP 3.5,6.12. 19 Fig. 8 . The DNA sequences Showed that both hybridizing regions contaln complete AFP genes which are each interrupted by a s~n g l e . short interven-Lng sequence of 177 bp in AOP 3 , and 182 bp in AOP 5 (Fig. 9). The lntervenrnq sequence effectively divides the gene into an exon that codes for the signal Peptide and one that codes for the mature ATP. Both genes habe-conventional splice junctlan sequences, initialon and termination codons, and typical polyadenylatlon signals. In ADP 5 a "TATA" box is located 118 bp upstream Of the lnltlation codan and a "CAAT" box 45 bp further ubstream.
In AOP 3 the same signals can be observed in the equivalent locations. The SDElCing between these two 514n111 and their 5 ' and 3 ' untransiated regions and of the signal peptide are sufficiently simllar to account for the selection of clones llke AOP 3 from the genomic library when using AFP CDNA clone # l 6 as a probe Table 2 . Amino acld dndlys15 of cleavage peptlder from HPLC-12.
Yield of amino acids is tabulated a5 "mol. Numbers in parentheses represent the number of resldues/peptlde based on sequence data and are totalled below. HSr is homoserine 0.2 0 1 Table 3 . Automated Edman degradation of HPLC-12 and its cleavage peptldes. Residues are numbered from the aninoterminus. Dotted lines signify the continuation of the Sequence beyond the last resldue deteralned. Table 1. A m l n~ acld analysls of ocean pout antifreeze polypeptides.
Yield of amino a c i d s is tabulated as nrnal. Column I represents the number of re5ldueS/protein column I1 were derived from protein sequence determin-calculated from amino acid analysis. The numbers in atians. The data for HPLC-6 is from reference 9.