Ovoinhibitor introns specify functional domains as in the related and linked ovomucoid gene.

We have isolated cDNA clones and determined the gene structure of chicken ovoinhibitor, a seven domain Kazal serine proteinase inhibitor. Using RNA blot hybridization analysis, the gene was identified initially as a region 9-23 kilobases upstream of the gene for the related inhibitor ovomucoid. Ovoinhibitor RNA appears in oviduct and liver. cDNA clones were identified by screening an oviduct cDNA library with a nick-translated DNA restriction fragment which contained an exon of the gene. The mature protein sequence derived from a cDNa clone is in excellent agreement with that which we obtained from direct sequencing of purified ovoinhibitor. The protein-sequencing strategy is reported. The P1 amino acids of the Kazal domains are consistent with the known broad inhibitory specificity of ovoinhibitor. The gene is about 10.3 kilobases in length and consists of 16 exons. Each Kazal domain is encoded by two exons. Like ovomucoid, introns fall between the coding sequences of the ovoinhibitor domains, an arrangement which may have facilitated domain duplication. The intradomain intron occurs in an identical position in all of the ovoinhibitor and ovomucoid Kazal domains, suggesting that this intron was present in the primordial inhibitor gene. We discuss the location of the intradomain intron in relation to the known structure of four Kazal inhibitors and suggest a scheme for the evolution of the ovoinhibitor gene.

We have isolated cDNA clones and determined the gene structure of chicken ovoinhibitor, a seven domain Kazal serine proteinase inhibitor. Using RNA blot hybridization analysis, the gene was identified initially as a region 9-23 kilobases upstream of the gene for the related inhibitor ovomucoid. Ovoinhibitor RNA appears in oviduct and liver. cDNA clones were identified by screening an oviduct cDNA library with a nick-translated DNA restriction fragment which contained an exon of the gene. The mature protein sequence derived from a cDNA clone is in excellent agreement with that which we obtained from direct sequencing of purified ovoinhibitor. The protein-sequencing strategy is reported. The P1 amino acids of the Kazal domains are consistent with the known broad inhibitory specificity of ovoinhibitor. The gene is about 10.3 kilobases in length and consists of 16 exons. Each Kazal domain is encoded by two exons. Like ovomucoid, introns fall between the coding sequences of the ovoinhibitor domains, an arrangement which may have facilitated domain duplication. The intradomain intron occurs in an identical position in all of the ovoinhibitor and ovomucoid Kazal domains, suggesting that this intron was present in the primordial inhibitor gene. We discuss the location of the intradomain intron in relation to the known structure of four Kazal inhibitors and suggest a scheme for the evolution of the ovoinhibitor gene.
Chicken ovomucoid, one of the major egg-white proteins, accounts for about 10% of the protein produced by the tubular gland cells of the oviduct of laying hens (Palmiter, 1972). Ovomucoid is a member of the Kazal family of serine proteinase inhibitors . The chicken genome probably contains the genes for at least four Kazal inhibitors . One of these Kazal inhibitors, ovoinhibitor (Matsushima, 1958), is present also in chicken egg white but at about one-tenth the amount of ovomucoid (Liu et al., 1971). Ovoinhibitor is, however, a major the a domains of chicken ovomucoid are more homologous to each other than the b domains. Chicken ovoinhibitor consists of six a domains and one b domain. As in ovomucoid, the b domain of ovoinhibitor is at the carboxyl terminus.  suggest that the genes of multidomain Kazal inhibitors such as ovomucoid and ovoinhibitor arose by duplication of a gene that coded for one a and one b domain, the latter occurring at the carboxyl terminus of the protein. Subsequently, the a domains must have duplicated in both genes.
The chicken ovomucoid domains are each encoded by two exons (Stein et al., 1980). The domain boundaries correspond to intron-exon junctions. Since the intradomain intron is located in an identical position in all three domains, Stein et al. (1980) suggest that the primordial inhibitor gene may have consisted of only the first exon of an inhibitory domain, which encodes the reactive site and that the intradomain intron was created by the subsequent addition of the second exon.
The studies presented here were conducted to examine the hypotheses of Laskowski et al. (1980) and Stein et al. (1980) for the evolution of Kazal inhibitors by determining the gene structure of other chicken Kazal inhibitors. In particular we wanted to isolate the gene for ovoinhibitor. As there are many examples of the close linkage of related genes that are expressed in the same tissue (Colbert et al., 1980;Efstratiadis et al., 1980), our approach was to determine which sequences in a 47-kb' region around the ovomucoid gene were expressed in the oviduct and homologous to ovomucoid. This analysis identified a region about 9-23 kb upstream from the 5' end The abbreviations used are: kb, kilobases; bp, base pairs; hPSTI, human pancreatic secretory trypsin inhibitor; Spase, Stuphylococcus uureus V8 proteinase.

5899
of the ovomucoid gene that codes for ovoinhibitor. Introns separate the exons coding for the Kazal domains and signal peptide of ovoinhibitor in a manner identical to that of the ovomucoid gene. We find that, the ovoinhibitor gene structure and sequence is consistent with the hypothesis of Stein et al. (1980) and . We suggest a possible scheme for the evolution of the ovoinhibitor gene.

EXPERIMENTAL PROCEDURES
General-Procedures for the preparation of phage and plasmid DNA have been previously described (Kulomaa et al., 1986). Individual DNA fragments were purified from agarose gels by the method of Dretzen et al. (1981). The sequencing of isolated DNA fragments was performed using either the method of Maxam and Gilbert (1977) or of Sanger et al. (1977). For the latter procedure the DNA fragments were cloned into M13 phage vectors (Messing, 1983). DNA and protein sequences were analyzed using the Microgenie programs from Beckman. All enzymes were purchased from either New England Biolabs or Boehringer Mannheim and used according to the suppliers' recommendations.
RNA Blot Hybridization Analysis-Total RNA from various tissues was prepared, resolved on formaldehyde-agarose gels and transferred to nitrocellulose filters as previously described (Sargan et al., 1986). The filters were hybridized to nick-translated DNA fragments according to the procedures of Sargan et al. (1986).
DNA Blotting and Hybridization-Restriction enzyme-digested plasmid DNAs were resolved in agarose gels and transferred to nitrocellulose filters according to the procedures of Southern (1975). The filters were hybridized to nick-translated DNA fragments at moderate stringency using the conditions described by Levine et al (1985).
Isolation of Ouoinhibitor cDNA Clones-An oligo(dT)-primed hen oviduct X g t l l cDNA library was screened as previously described (Kulomaa et al., 1986). The probe used was a nick-translated 0.7-kb XhoI fragment from the clone 014.3. Several cDNA clones were isolated and restriction mapped according to established procedures (Colbert et al., 1980). Overlapping DNA fragments of the longest clone OIc were sequenced.
Protein Sequence of Ovoinhibitor-"Crude ovomucoid" was prepared as described in Kato et al. (1987) and applied to a Bio-Gel P-10 column equilibrated with 5% formic acid. The large molecular weight fraction is "crude ovoinhibitor," and the smaller molecular weight fraction is ovomucoid. Crude ovoinhibitor was further purified on a CM-Sepharose column at pH 5.5 and developed with NaCl gradient. Sequencing was carried out on a Beckman 890C sequenator. The procedures employed are given in Kato et al. (1987) and in Laskowski et al. (1987). The strategy used is given in Fig. 3. This strategy emphasized hydrolysis of the intact protein either at interdomain connecting peptides or at the reactive sites, thus some of the products may be of interest for future studies of the behavior of isolated domains or of reactive site-modified inhibitors.

RESULTS
Identification and Localization of the Ouoinhibitor Gene-From a cosmid library of chicken genomic DNA, we have isolated overlapping clones that contain the ovomucoid gene and about 23 kb of 5' and 17 kb of 3"flanking DNA? We investigated whether any of the cloned DNA sequences flanking the ovomucoid gene were transcribed also in the oviduct. To this end, total RNA from either oviduct or liver was resolved in agarose-formaldehyde gels and transferred to nitrocellulose. The filters were hybridized at high stringency to nick-translated nonrepetitive DNA fragments from subclones of the ovomucoid gene region. DNA fragments from the clones 013.7,012.0,011.4, and 012.3 all hybridized to a 1.7-kb RNA present in both oviduct and liver total RNA (Fig. 1). These clones span a region from about 9-23 kb upstream from the 5' end of the ovomucoid gene. In contrast, DNA fragments containing the ovomucoid gene hybridize to a 1. RNA that is present only in oviduct RNA (Nordstrom et al., 1979). DNA fragments up to 9 kb upstream of the 5' end or downstream of the ovomucoid gene did not hybridize specifically to oviduct or liver RNAs? The ovomucoid-linked gene was subsequently shown by sequence analysis of homologous DNA fragments to code for the related Kazal inhibitor, ovoinhibitor. The homologous ovoinhibitor DNA fragments that were to be sequenced were identified by DNA blot hybridization analysis. These hybridization experiments were performed to determine if the ovomucoid-linked (ovoinhibitor) gene was related to the ovomucoid gene. For this purpose plasmid DNAs from the 47-kb cloned ovomucoid gene region were digested with various restriction enzymes, resolved on agarose gels, and transferred to nitrocellulose filters. The filters were hybridized under moderate stringency to nick-translated DNA fragments from the ovoinhibitor gene region. A 1.4-kb XhoI-HindIII DNA fragment from the clone 011.4 hybridized specifically to the 3.4-kb HindIII-EcoRI fragment of the clone OM14.5, which contains the exons that code for the second and third domains of ovomucoid (Lai et al., 1979a) (Fig. 2A). Similar hybridization to the OM14.5 clone was observed with probes from the clones 012.1 and 012.3: Additionally, a nick-translated 2.1kb XhoI-BamHI fragment from the clone 012.1 hybridized under moderate stringency to three DNA fragments within the ovoinhibitor gene region (Fig. 2, B and C). One of these fragments, a 0.4-kb PstI fragment from the clone 013.0 hybridized to a 0.6-kb BgZII-PuuII fragment of the clone 012.1 (Fig. 20). This hybridization data suggests that the ovoinhibitor gene region is both internally repetitious and homologous to the ovomucoid gene.  Hybridization to the vector fragments, 4.4 or 3 kb for pBR322 or pGEM, respectively, is due to contaminating pBR322 sequences in the probes used. The clone OM14.5 contains 9.1 kb of 5"flanking DNA and all of the 5.6-kb ovomucoid gene with the exception of 119 bp of 3"noncoding sequence (Catterall et d, 1979;1980). In panel E the location of the fragment within 012.1 that hybridized to the 012.0 probe (hatched box) as well as the fragments that hybridized to the 012.1 probe (filled boxes) are shown. The clone 012.0 is a subclone of the 2.0-kb KpnI-HindIII fragment of 016.5. The 5' to 3' orientation shown is the same as that of the ovomucoid gene. B, BamHI; G, BglII; H, HindIII; K, KpnI; P, PuuII; Ps, PstI; X, XhI.
To determine the. basis for the internal homology of the ovoinhibitor gene region, the four related fragments were sequenced. The homologous DNA fragments were found to share similar DNA sequences of about 90-140 bp. Over this region the four common sequences were 60-70% homologous. Additionally, the sequences common to each fragment were also all about 60% homologous to the first exons that code for the a Kazal domains of ovomucoid. Moreover, these ovomucoid-related sequences all contained an open reading frame that could potentially code for a peptide homologous to the ovomucoid Kazal domains. We therefore compared these four derived peptide sequences with protein sequences of various Kazal proteinase inhibitors. The Purdue group (I. K., W. K., M. L.) have determined the primary structure of chicken ovoinhibitor by protein sequencing (Fig. 3). It should be noted however, that since the ovoinhibitor sequence was unknown to the rest of us until after the DNA cloning, blot hybridization, and DNA sequencing studies discussed above had been performed, the protein sequence was not utilized for the cloning and initial identification of the ovoinhibitor gene.
The peptide sequence derived from the ovomucoid-related DNA sequence of the 0.7-kb XhoI fragment from the clone 014.3 was identical to about the first two-thirds of the first Kazal domain of ovoinhibitor. Similarly, the peptide sequences derived from the ovomucoid-related sequences of the 0.6-kb BglII-PuuII fragment from the clone 012.1, and the 0.5-kb BamHI-KpnI and 0.4 kb PstI fragments from the clone 013.0 were identical to about the first two-thirds of the second, third, and fourth domains of ovoinhibitor, respectively. Therefore, we felt that this ovomucoid-linked gene codes for ovoinhibitor, and we had identified by sequence homology the first exons of the first four domains of the protein. Expression of the ovoinhibitor gene in oviduct and liver was consistent with the presence of the protein in egg white and plasma.
Primary Structure of Ouoinhibitor Protein Deriued from a cDNA Clone and as Determined by Protein Sequencing-As the coding sequences of the 0.7-kb XhoI fragment from the clone 014.3 represent part of the first domain and thus should correspond to sequences near the 5' end of the ovoinhibitor mRNA, this fragment was used as a probe to screen a chick oviduct cDNA library. The DNA sequence and derived protein sequence of the longest clone that was isolated, OIc is shown in Fig. 3. The amino terminus of the mature protein corresponds to position 24 of the derived preprotein sequence. The strategy used in determining the primary structure of ovoinhibitor by protein sequencing is shown in Fig. 3. The amino acid sequence of the mature protein derived from the cDNA clone is identical to that determined by protein sequencing with one exception. The sole difference is an Asn at position 12 of the fifth domain that was identified as an Asp by sequencing of the protein. As the Asn is adjacent to a Gly we believe the identification of the amino acid as an Asp may have been due to spontaneous deamidation of the protein some time after synthesis (Meinwald et aL, 1986). The putative signal peptide sequence conforms with the consensus identified by Watson (1984). The P1 amino acids of the seven Kazal domains are Arg, Arg, Arg, Arg, Phe, Met, and Met. Thus, the first four domains could potentially inhibit trypsin, the fifth chymotrypsin, and the sixth and seventh domains could inhibit both chymotrypsin and elastase . Although it is not certain that all seven domains are independently active, chicken ovoinhibitor does contain at least five reactive sites: two for bovine trypsin, two for bovine chymotrypsin, and one for porcine pancreatic elastase4 (Gertler and Feinstein, 1971;Gertler and Ben-Valid, 1980). Moreover, while none of the trypsin binding sites are sensitive to Met oxidation, half of the chymotrypsin sites and all of the elastase sites are sensitive to this modification (Schechter et al., 1977).
Domain Structure of the Ouoinhibitor Gene-To determine the exon-intron arrangement of the ovoinhibitor gene, we sequenced DNA fragments from within the transcribed region that were identified by RNA blot hybridization analysis. Exons were identified by sequence alignment with the OIc cDNA sequence. The DNA sequence of all exons and exonintron boundaries was determined. The gene exon and OIc cDNA sequences were in complete agreement with the exception of one nucleotide difference in each of the coding and 3'untranslated sequences. Both nucleotide substitutions were silent changes, that is they did not alter the derived protein sequence shown in Fig. 3. These nucleotide differences probably reflect polymorphisms within the chicken population (Lai et aL, 1979b). For those exon-intron boundaries that were ' I. Kato The Asn in the fifth domain that was determined to be an Asp by protein sequencing as well as the putative polyadenylation signal AATAAA are underlined. The horizontal lines immediately below the amino acid sequences indicate those portions of the peptides described in the text for which actual protein-sequencing data were obtained. Short vertical lines mark NHZ-terminals and in those cases when the entire peptide was sequenced, the COOH-terminals. The letter code indicates the source of peptides and cleavage method used to produce the peptides as follows: 1) CNBr cleavage of intact ovoinhibitor produced two fragments, the intact first domain (1-68) designated as CN-3 and domains 11-VI1 (69-499). The second fragment was nicked internally at Met residues 84,352,411, and 446, but it was held together by disulfide bridges. Performic acid oxidation yielded five peptides CN-0-1 (69-84), CN-0-2 (85-ambiguous due to sequence redundancy at adjacent junctions, the putative splice site was assigned according to the GT/AG rule (Breathnach and Chambon, 1980). The ovoinhibitor gene is about 10.3-kb long and consists of 16 exons separated by 15 introns (Fig. 4). The 3' end of the ovoinhibitor gene is about 9.5 kb upstream from the 5' end of the ovomucoid gene. The intron-exon boundary sequences are shown in Table I. All intron sequences begin with a GT, end with an AG, and in general agree with the splice junction consensus sequences determined by Mount (1983). The 5' end of the ovoinhibitor gene was determined by a primer extension assay, using an oligonucleotide complementary to the 5' end of the OIc cDNA clone as a primer, reverse transcriptase, and oviduct poly(A+) RNA? The transcription initiation site occurs 15 bp upstream of the 5' end of the cDNA sequence.
The first two exons of the ovoinhibitor gene code for the signal peptide and the first two amino acids of the mature protein. Each of the seven Kazal domains is encoded by two exons. The sites at which the introns occur in the coding sequences of the domains is shown in Fig. 5. For each domain one intron separates the codon for the first amino acid of the domain from the preceding codon. In addition, one intron occurs at an identical position within the coding sequences of all the domains. This intradomain intron separates the second and third nucleotides of the codon of the amino acid indicated in Fig. 5. Therefore, the coding sequences of all the a domains are bordered by introns. The coding sequences of the second exon of the seventh domain, which is a b-type, are contiguous with the 3"noncoding sequences.
Comparative Structure of the Ovoinhibitor and Ovomucoid Genes-Introns separate the coding sequences of the ovomucoid protein domains at sites identical to the corresponding coding sequences in the ovoinhibitor gene (Fig. 6). The structures of the two genes are very similar. The small 20-bp second exon of both the ovoinhibitor and ovomucoid genes codes for the carboxyl-terminal portion of the signal peptide and the first two amino acids of the mature protein. The first exon of the ovoinhibitor second and fifth domains is three bp shorter than the first exon of the other a domains in ovoinhibitor and ovomucoid. On alignment of the a domain sequences, it appears that these deletions are in the region between the codons for the first and second cysteines. Similarly, fewer nucleotides in the region between the codons for Cys I and Cys I1 of the first exons of the b domains accounts for the smaller size of these exons relative to the first exons of the a domains of the two genes.
We have previously demonstrated that there are two 5' splice sites that are used in the splicing of the second exon of the second chicken ovomucoid domain to the first exon of the third domain (Stein et al., 1980). The most frequently used 5' splice site defines a 67-bp exon. The other 5' splice site, which defines a 61-bp exon, occurs in an identical position at the end of all six ovoinhibitor a domain second exons. The minor splice site thus appears to correspond to the ancestral site whereas the more frequently used splice site appears to have arisen by mutation within the intervening sequence. In support of this conclusion, the dipeptide Va1'34-Ser'35 encoded by the additional 6 bp at the 3' end of the 67-bp ovomucoid sixth exon is present only in ovomucoids of most (not all) birds belonging to the order Galliformes . That is, it appears that the second exon of the ovomucoid second domain was expanded to 67 bp only after the divergence of Galliformes from other birds. The shorter length of the second exon of the ovomucoid first domain (58 bp) relative to the putative 61-bp ancestral exon probably reflects a recent deletion in the exon or a shifting of the 5' splice site.
The sizes and sequences of the corresponding introns in the ovomucoid and ovoinhibitor genes do not appear to be conserved. An exception appears to be the intradomain intron of the b domains which are of similar size. We have, however, sequenced about one-third of the ovomucoid intron, and over this region the ovomucoid and ovoinhibitor introns do not share significant sequence homology. The introns that flank the 20-bp second exon are relatively large in both genes. The interdomain introns are, on the average, about twice the size of the intradomain introns. There is apparently no significant sequence homology between the ovoinhibitor intradomain or interdomain introns. We have, however, only partially sequenced many of these introns. The 5"flanking sequences of the two genes have little homology, with the exception of the TATA box and a nonanucleotide TGACAGATT, which are   about 30 and 80 bp, respectively, upstream from the predom-domains of ovoinhibitor (I-VI) are more closely related to transcription initiation sites of both genes.3 Interest-each other than to the b-type domain VI1 (Fig. 7, left panel). the upstream transcription initiation site of the ovo-The third domain is unusual in that all of the other a domains mucoid gene (Lai et al., 1982;Gerlinger et aL, 1982) maps to are more related to the third domain than to any other a within the conserved nonanucleotide sequence motif.
domain. This homology suggests that domain I11 may evolve of ovoinhibitor are more closely related to the a domains of ovomucoid (I and 11) than to the b-type domain I11 (Fig. 7,  right panel). Similarly, the ovoinhibitor b domain is more related to the ovomucoid b domain than to any a domain. The ovoinhibitor and ovomucoid b domains are about as homologous as are the a domains of the two genes. The ovoinhibitor a domains are about equally homologous to each of the ovomucoid a domains. These observations are consistent with the proposal that the ovoinhibitor and ovomucoid genes are descended from a gene that coded for one a and one b Kazal domain (Fig. 8). Yamamoto et al. (1985) have reported the sequence of a cDNA clone encoding the human pancreatic secretory trypsin inhibitor (hPSTI), a single domain Kazal serine proteinase inhibitor. The ovoinhibitor and ovomucoid putative signal peptide sequences are more closely related to each other (46% amino acid, 55% nucleotide homology) than to the putative signal peptide sequences of hPSTI (26 and 21% amino acid, 43 and 40% nucleotide homology for ovoinhibitor and ovomucoid, respectively). Moreover, the Kazal domain sequences of hPSTI (positions 7-58 of mature protein) have similar homology (28-37% amino acid, 40-45% nucleotide) to each of the a-and b-type domains of ovoinhibitor and ovomucoid. These observations suggest that the ovomucoid-ovoinhibitor and hPSTI ancestral genes diverged prior to the duplication that generated the primordial a and b domains of multidomain Kazal inhibitors (Fig. 8).
To estimate the branching order of the ovoinhibitor, ovomucoid, and hPSTI domains, we have used the modified UPG method of Li (1981) which corrects for unequal rates of evolution. The most likely pathway derived by comparing the protein sequences of the 11 domains is shown in Fig. 8. We have also used a protein sequence parsimony method that is similar to Fitch (1971) except that silent changes are not counted. According to this procedure the scheme shown requires the fewest changes. The same phylogenetic tree was also obtained by W. M. Fitch and M. L. by the use of the Fitch (1971) maximal parsimony algorithm on the (then incomplete) protein sequences. As expected from the above sequence comparisions, the a-b two-domain ancestor of ovomucoid and ovoinhibitor and the hPSTI ancestor each descend from a common single-domain primordial inhibitor. The branching order of the first four domains of ovoinhibitor should be regarded as somewhat uncertain because of the similarity of these sequences and also as domain I11 appears to have a relatively slower rate of evolution. However, it is interesting that the branching order of ovoinhibitor domains appears to be unidirectional in that the newly duplicated domain is always closer to the amino terminus of the protein.

DISCUSSION
Since the formation of the seven-Kazal domain ovoinhibitor gene has involved the repeated duplication of domain-coding sequences, this suggests that there was a selection pressure for domain duplication. One physiological advantage of a large plasma inhibitor, and thus a possible selection pressure for domain duplication, is that it prevents excretion . Another possible advantage is that with several independent reactive sites of potentially different specificity, a single polypeptide may be able to inhibit a broad spectrum of serine proteinases. In regard to the latter, the P1 amino acids of the Japanese quail ovoinhibitor domains (Laskowski et al., 1978) are identical to those of the corresponding chicken ovoinhibitor domains except for the fifth domain. The P1 amino acid of this domain is a Tyr in Japanese quail  . 7. Comparison of the ovoinhibitor domains to each other and to the ovomucoid domains. In the left panel, the percent homology between the aligned DNA-coding sequences of the ovoinhibitor domains is shown above the diagonal. The percent homology between the aligned amino acid sequences of the ovoinhibitor domains is shown below the diagonal. In the right panel, the percent homology between the aligned amino acid sequences of the ovomucoid and ovoinhibitor domains is shown to the left of the middle vertical line. The percent homology between the aligned DNA-coding sequences of the ovoinhibitor and ovomucoid domains is shown to the right of the vertical line. In comparing a-type and b-type domains, the additional linker peptide amino acids of the a domains were not included in the alignment with b domain nucleotide and amino acid sequences. The percent homologies between two domain sequences were calculated by dividing the number of matches by the number of amino acids or nucleotides in the longest of the two domains. Deletions were considered to be mismatches. but a Phe in chicken, a change unlikely to affect inhibitor specificity. The Japanese quail and chicken ovoinhibitor domains are therefore apparently functionally equivalent. Thus, the ability to inhibit a variety of serine proteinases may be physiologically advantageous. However, while most avian ovoinhibitors can inhibit both trypsin and chymotrypsin, the penguin ovoinhibitor can only inhibit trypsin and the ostrich homolog cannot inhibit either proteinase (Liu et al., 1971). Clearly, the reactive sites of many more avian ovoinhibitors need to be sequenced to determine if whether in general, ovoinhibitors can potentially inhibit a broad spectrum of serine proteinases.

$-
We have previously suggested that the primordial Kazal inhibitor gene may have consisted only of the first exon of the inhibitory domain (Stein et al., 1980). This exon encodes the reactive site and five of the six conserved cysteines. Addition of the second exon, which encodes the sixth cysteine, probably created a more efficient inhibitor as breakage of disulfide bonds tends to decrease inhibitor activity. That the intradomain intron occurs in an identical location within the coding sequences of the seven ovoinhibitor domains and the three ovomucoid domains supports the hypothesis that the intron was created by the addition of the second exon of the domain. Gilbert (1978) has proposed that exons code for polypeptides that have a discrete structure or function. As the tertiary structures of the Kazal inhibitors, bovine pancreatic secretory trypsin inhibitor (Bolognesi et al., 1982) and the third domains of Japanese quail (Papamokos et al., 1982), turkey (Read et al., 1983), and of silver pheasant (Bode et al., 1985) ovomucoids are known, it is possible to examine this hypothesis for the Kazal domains. The tertiary structures of the four Kazal inhibitors are similar. All contain similar elements of second-ary structure, a triple-stranded anti-parallel 0 sheet and an (Y helix. The reactive sites, which have a similar geometry but are not part of the / 3 sheet or a helix, occur near the surface of the protein. In the inhibitors the a helix is followed by a surface loop. As the structures of these relatively divergent inhibitors are very similar, it seems reasonable to assume that the more closely related chicken ovomucoid and ovoinhibitor domains would have structures which are similar to that of the Japanese quail, turkey and silver pheasant ovomucoid third domains. With this assumption, the first exon of a chicken ovomucoid or ovoinhibitor domain would encode the two long strands of the sheet and most of the (Y helix. As the exact end of the (Y helix is unclear in the ovomucoids, particularly silver pheasant third domain (Bode et al., 1985), it is difficult to predict how much of the (Y helix would be present in the polypeptide encoded by the first exon of a domain. The intradomain introns are therefore apparently similar to several of the chicken pyruvate kinase and triose phosphate isomerase introns (Longberg and Gilbert, 1985;Marchionni and Gilbert, 1986) in that they fall between, but not within, gene sequences encoding stretches of (Y helix or 0 sheet. The second exon of any of the domains would encode the small third strand of the 0 sheet. In addition, there would be many hydrogen bonds formed between the polypeptides encoded by the first and second exons of a domain. It seems possible then, that the polypeptide encoded by the first exon can fold into a functional structure that is stabilized by interactions with the polypeptide encoded by the second exon of a domain. The structure of the Kazal inhibitors, therefore, does appear to be consistent with the hypotheses of Stein et al. (1980) and Gilbert (1978). The amino acid sequence and geometry of the reactive site region of the streptomyces subtilisin inhibitor is very similar to the apparently otherwise unrelated Kazal inhibitors (Papamokos et al., 1982). If this similarity is due to divergent evolution, then the formation of the primordial streptomyces subtilisin inhibitor gene may have involved duplication of the first exon of the Kazal inhibitory domain.
All known Kazal inhibitors are secreted proteins . This suggests that the acquisition of the exons coding for the signal peptide was probably one of the initial events in the formation of the primordial Kazal inhibitor gene. In support of this proposal, the first intron of both the ovoinhibitor and ovomucoid genes falls in an identical position within the coding sequences of the signal peptide. That the dog submandibular Kazal inhibitor consists of one a and one b domain has suggested to us that the duplication which formed the primordial two-domain inhibitor must have occurred prior to the mammalian-bird divergence, about 300 million years ago (Stein et al., 1980;. Analysis of the nucleotide and amino acid sequences of the ovoinhibitor domains supports the hypothesis that the ancestral two-domain ovoinhibitor gene consisted of one a-type and one b-type domain. The alligator genome appears to code for at least two high molecular weight Kazal inhibitors, a four-domain ovomucoid , and an inhibitor present in plasma . Preliminary experiments suggest that the latter has an amino-terminal sequence homologous to chicken ovoinhibitor . These observations imply that the duplication of the postulated ovomucoid-ovoinhibitor twodomain ancestor must have occurred prior to alligator-bird divergence, which has been estimated to be about 250 million years ago (Norman, 1985). The subsequent duplication of the a but not the b domains in the ovoinhibitor and ovomucoid ancestral genes is probably because the a domain boundaries correspond to exon-intron junctions whereas the b domain coding sequences are contiguous with the 3"noncoding sequences. That is, expression of a duplicated b domain would require mutation of the translation stop codon that immediately follows the codon for the important sixth cysteine, as well as creation of a splice site. As domain boundaries correspond to exon-intron junctions, recombination within interdomain introns could have resulted in domain duplication. Although the interdomain introns of the ovoinhibitor gene do not appear to be homologous, duplication of domain exons should also duplicate intron sequences, and it is therefore likely that at one time some or all of the interdomain introns were homologous. Thus, in the gene for the chicken Kazal inhibitor ovoinhibitor, as in the related ovomucoid gene, introns separate functional protein domains, an arrangement which may have facilitated domain duplication.