Molecular characterization of the murinoglobulins.

Proteinase inhibitors of the alpha 2-macroglobulin (alpha 2M) type, although well characterized in vitro, still evade a precise description of their actual role in vivo. The main reason for this is the absence of any clinical evidence for the malfunctioning of alpha 2M in humans. Moreover, despite their ubiquitous presence in animals of very different taxa, animal models are notoriously absent in this field. With the advent of transgenic animals an important tool became available in this respect. As a first step in this direction we are analyzing at the molecular level all the members of this proteinase inhibitor family in the mouse. To retrieve related sequences we screened a mouse liver cDNA library with human alpha 2M cDNA. The sequences from two isolated clones partially coded for a protein with a high degree of sequence identity with human alpha 2M, rat alpha 2M, and rat alpha 1I3. Protein sequence data from the large and small subunits of mouse alpha 2M and of the protein isolated from mouse plasma allowed us to designate the clones as coding for murinoglobulin (MUG), an alpha 2M-related single-chain proteinase inhibitor. Rescreening resulted in the isolation of 24 clones, of which 21 were related or identical to the original MUG clones. Restriction analysis led to three groups of clones of which representative members were sequenced. Two highly homologous cDNA sequences were derived, coding for proteins that displayed the typical features of alpha 2M-type proteinase inhibitors: the overall size, the positions of a putative bait region and of the internal thiol ester, and the positional conservation of cysteine residues and putative asparagine-glycosylation sites. A third related member for which only one incomplete cDNA clone was obtained and sequenced, proved to be aberrant: the bait region contained what appeared to be an intron which escaped proper splicing. A second apparent intron was present at the 5' end of the cDNA while a frameshift mutation near the 3' end (insertion of a G) caused premature termination of the reading frame when compared to the other MUG sequences. These features were confirmed from an isolated genomic clone and extended at the genomic level: the corresponding gene, a transcriptionally weakly active pseudogene, contained the small intron but as part of a larger intron. The presence of suitable intron/exon splice sites show that a relatively small part of the intron is being introduced as an exon in the mRNA.(ABSTRACT TRUNCATED AT 400 WORDS)

Proteinase inhibitors of the a2-macroglobulin (a2M) type, although well characterized in vitro, still evade a precise description of their actual role in vivo. The main reason for this is the absence of any clinical evidence for the malfunctioning of a2M in humans. Moreover, despite their ubiquitous presence in animals of very different taxa, animal models are notoriously absent in this field. With the advent of transgenic animals an important tool became available in this respect. As a first step in this direction we are analyzing at the molecular level all the members of this proteinase inhibitor family in the mouse. To retrieve related sequences we screened a mouse liver cDNA library with human a2M cDNA. The sequences from two isolated clones partially coded for a protein with a high degree of sequence identity with human a2M, rat a2M, and rat alI3. Protein sequence data from the large and small subunits of mouse a2M and of the protein isolated from mouse plasma allowed us to designate the clones as coding for murinoglobulin (MUG), an a2M-related single-chain proteinase inhibitor. Rescreening resulted in the isolation of 24 clones, of which 21 were related or identical to the original MUG clones. Restriction analysis led to three groups of clones of which representative members were sequenced. Two highly homologous cDNA sequences were derived, coding for proteins that displayed the typical features of m2Mtype proteinase inhibitors: the overall size, the positions of a putative bait region and of the internal thiol ester, and the positional conservation of cysteine residues and putative asparagine-glycosylation sites. A third related member for which only one incomplete cDNA clone was obtained and sequenced, proved to be aberrant: the bait region contained what appeared to be an intron which escaped proper splicing. A second apparent intron was present at the 5' end of the cDNA while a frameshift mutation near the 3' end (insertion of a G) caused premature termination of the reading frame when compared to the other MUG sequences. These features were confirmed from an isolated genomic clone and extended at the genomic level: the corresponding gene, a transcriptionally weakly active pseudogene, contained the small intron but as part of a larger intron. The presence of suitable intronlexon splice sites show that a relatively small part of the * This investigation was supported by Grant 3.0069.89 from the Fonds voor Geneeskundig Wetenschappelijk Onderzoek, by a grant of the Interuniversity-network for fundamental research (1987)(1988)(1989)(1990)(1991) from the Belgian government, and by a grant from the Levenslijnaktie Nationaal Fonds voor Wetenschappelijk Onderzoek (Belgium). The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "aduertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide sequence(s) reported in thispaper has been submitted to the GenBankTM/EMBL Data Bank with accession number(s) M65736, M65238, and M65237. intron is being introduced as an exon in the mRNA. Although the resulting reading frame is not open, which makes this mRNA useless, the mechanism as such is highly intriguing since it occurs in the bait region, the least conserved sequence in all members of the a2M family.
The mouse plasma protein, murinoglobulin, is characterized as a single-chain (190 kDa) proteinase inhibitor, belonging to the family of a2-macroglobulin (1,2). This family of plasma proteins with comparable structural and functional characteristics is present in the circulation of vertebrates and invertebrates but also in the egg white of birds and reptiles (3,4). As a general mechanism, a proteinase activates the inhibitor by specific proteolysis in the bait region, which, by an unknown mechanism leads to reaction at the cysteinyl-glutamyl internal thiol ester site and to a conformational change, whereby the proteinase is trapped and/or covalently bound to the inhibitor. This binding is mainly steric in the tetrameric forms of the family, with human a2M' as the best studied member, but is covalent in the single-chain inhibitors, thereby accentuating their structural homology with complement components C3 and C4 (3). While in the tetrameric proteinase inhibitors steric inhibition is sufficiently strong, monomeric forms need a covalent linkage between the activated glutamyl residue of the original thiol ester and a terminal amino group of a lysine or another nucleophilic group on the proteinase, for inhibition to be effective (5)(6)(7)(8).
The thiol ester and the receptor-recognition domain are highly conserved regions within different aM, whereas the bait region is a stretch of amino acids which is unique for every member of the aM-family, containing specific sites for various proteinases. The bait region determines the preference of these inhibitors for different sets of proteinases. Finally, expression of the receptor-binding domain is responsible for rapid elimination of the aM-proteinase complexes (4).
In the mouse the murinoglobulin originally described (1) is a single-chain proteinase inhibitor, thought of as the murine equivalent of a113 in the rat. In the absence of molecular data this needs to be demonstrated. In a study aimed at the molecular cloning of all members of the aM family in mice we have isolated several different cDNA clones. In this report we describe the isolation and complete sequencing of two murinoglobulins at the cDNA level. Characterization of a third, partial cDNA and of part of the corresponding gene proved it to be derived from a pseudogene. Only one murinoglobulin was isolated from mouse plasma, the NH2-terminal sequence of which corresponded to the one predicted by translation of the most prominent cDNA.

Methods
Screening-X phage were plated on the appropriate Escherichia coli hosts (Y1090 for XGT11, BB4 for XZap, and KW251 for XGEM11) and grown on NZYDT-agar substrate at 37 "C until plaques were of appropriate size. Replica's were made on nylon filters (Du Pont Colony Screen) and hybridized with the probes indicated. The cDNA probes were isolated inserts in low melting agarose, labeled by hexanucleotide mediated incorporation of [32P]dCTP (17)(18). Oligonucleotides were end-labeled with ["'PIdATP or with digoxigenin-dUTP (Boehringer Mannheim).
Subcloning of XGTll and XGEMll Inserts-X inserts containing the putative inserts were excised with EcoRI (for XGTll inserts) or SacI (for XGEMll inserts) and ligated into pUC18.
Excision and Rescue of hzap Inserts-XZap DNA containing the putative murinoglobulin cDNA inserts were excised by the automatic excision process as described by the manufacturer. The colonies, containing the rescued pBluescript plasmid with the cloned cDNA insert, were processed by standard procedures (19,20).
DNA Sequencing-General methods of DNA manipulation were according to Maniatis (19) and Davis (20). Purified DNA was sequenced on both strands by the dideoxy chain-termination method of Sanger (21) (Sequenase, U. S. Biochemicals). Deletion clones were made with different restriction enzymes. Remaining gaps were sequenced with internal oligonucleotides as primers, synthesized on either Millipore or AB1 equipment according to standard procedures provided by the manufacturers. The DNA sequence data were collected and analyzed with the Genepro (version 4.2) program (Riverside Scientific). A total of 15778, 11679, and 12335 nucleotides were sequenced from the murinoglobulins 1,2, and 3, respectively.
Murinoglobulin Isolation-Heparinized mouse plasma, collected from anesthetized animals by cardiac puncture, was treated with polyethylene glycol (4% final concentration). After centrifugation the clear supernatant was separated by gel filtration (ACA 34). The peaks corresponding to mouse a2M and to murinoglobulin were pooled individually. Murinoglobulin was isolated by hydrophobic interaction chromatography on a Phenyl-TSKPW5 column, essentially using the same buffer system as described for a2M (22). Mouse n2M was isolated and characterized as described (23). NH2-terminal sequencing of murinoglobulin and of the 165-and 35-kDa subunits of mouse n2M was performed on an Applied Biosystems Model 477A Sequencer, with on-line phenylthiohydantoin identification (Model 120A). SDS-polyacrylamide gel electrophoresis of proteins was as described (23).

RESULTS
Isolation of Murinoglobulin cDNA Clones-A mouse liver cDNA library in XGTll was screened with a partial cDNA probe from human a2M (24). The inserts of two positive Not all the restriction sites of the tested enzymes are shown, but only these sites missing in one of the classes of murinoglobulins. Open circles denote the absence of the restriction site. plaques, designated 15 (2.6 kb) and 20 (2.9 kb), were subcloned in pUC18. Both clones were sequenced by the dideoxy chaintermination method, after preparation of deletion clones with suitable restriction enzymes. Remaining gaps were sequenced with internal synthetic oligonucleotides. The two clones overlapped 2081 nucleotides of identical sequence while combined they resulted in a sequence of 3532 nucleotides, including a 29-nucleotide poly(A) tail (Fig. lA). The derived amino acid sequence showed considerable differences with the NHp-terminal amino acid sequence of the 35-kDa subunit of mouse ~x2M.~ Comparison with the published cDNA sequences of human a2M, rat a2M, and especially rat a113 suggested that the clones represented murinoglobulin.
Screening of a second mouse liver cDNA library (in XZap) using clone 15 as a probe, produced 24 different positive clones. Characterization by restriction mapping with the enzymes EcoRI, Sad, NcoI, HindIII, and EcoRV revealed four different groups of clones. One group of three clones were shown to correspond to mouse a2M. Of the other, a large group of 17 clones showed the same restriction pattern as the original clone 15, which we designated murinoglobulin 1 (MUG1). Three clones displayed a different restriction pattern (missing one EcoRI, one SacI, two NcoI, and one EcoRV restriction site, Fig. 1B) and were classified as MUG2. Only * F. Van Leuven and S. Torrekens, unpublished results. one clone, LZ52, displaying a third restriction pattern, (missing two Sac1 sites, one HindIII, and one EcoRV site compared to MUG1, Fig. 1B) was isolated and designated MUG3.
Screening of the same cDNA library with a restriction fragment of human a2M revealed 36 positives, of which two corresponded to MUGl (Fig. 1C). Another screening for more full-length MUG clones was done with a 5'-terminal 889-bp restriction fragment of LZ37 (MUGS). 10 positive clones were obtained, seven corresponding to MUGl and three to MUG2 (Fig. 1D). The NcoI restriction site is remarkable. It was present in LZ1 at position 488-493, but missing in the seven MUGl clones of the third screening. The presence of this NcoI restriction recognition site in LZ1 was confirmed by sequencing. In LZ3/5, LZ3/6, and LZ3/10 nucleotide sequencing showed a cytosine to thymine point mutation at position 489, causing the disappearance of the recognition sequence for NcoI. This polymorphism for MUGl indicates at least a two-allele system.
Sequencing of Murinoglobulin cDNA Clone-The complete sequence of MUGl cDNA was obtained by combining three partially overlapping clones, 15, 20, and LZ1, with a total length of 4626 base pairs ( Fig. 2 and 3). For MUG2 sequence data from two clones, LZ23 and LZ37 were combined to obtain a full-length sequence of 4566 base pairs ( Fig. 2 and 4). Clone LZ23 was sequenced completely. Combined with sequencing of the 5'-and 3"terminal fragments of LZ37, which is a fulllength clone of 4.5 kb, this yielded the complete MUG2 sequence. To ascertain that both clones belonged to the same variant murinoglobulin, the bait region of clone LZ37 and the thiol ester domain, which is typical as a consequence of a deletion of 75 nucleotides relative to MUG1, were also sequenced, to confirm their identity with the corresponding sequences from LZ23.

and 5 ) .
In all cases we employed the same sequencing strategy described above, which combines the preparation of deletion clones after subcloning of the X inserts into pUC18 (for XGTll inserts) or after rescue into pBluescript plasmids (for XZap inserts) and the synthesis of specific internal oligonucleotides as sequence primers. Since the different murinoglobulins are highly homologous, many primers could be used for the different variants.
Analysis of MUGl and MUG2 cDNA Sequences-MUG1 cDNA contained an open reading frame of 4428 nucleotides, .... whereas in MUG2 only 4353 nucleotides were in-frame. An ATG start codon was sequenced in both variants and part of the 5"nontranslated region was obtained (39 nucleotides for MUGl and 64 for MUG2). In the 3"untranslated region a canonical AATAAA polyadenylation signal is present 15 nucleotides upstream from the poly(A) tail in MUG1, whereas only 13 nucleotides separates them in MUG2 (Figs. 3 and 4). At the nucleotide level a sequence identity of 93% is observed between the two full-length murinoglobulin variants. Compared to rat a113 an overall sequence identity of 77% is seen with the prototype alI3/2J, whereas only an identity of 50% is observed with mouse a2M. 3 Translation of the nucleotide sequences in open reading frames of 1476 (MUG1) and 1451 (MUG2) codons ( Fig. 3 and 4) yields an overall amino acid sequence identity of 89%. The lowest level of amino acid sequence identity, only 56%, is observed in the putative bait region (position 682-734 in MUG1). In other regions the homology is almost complete, which is most notable at both the NH2-and COOH-terminal ends. A rather large deletion of 25 amino acids is typical for MUG2 (corresponding to residues 1094-1120 in MUG1).
The predicted sequence of MUGl contains 26 cysteine residues, one more than MUG2. In both murinoglobulin variants 24 of these residues are positionally conserved. In variant MUG2 the cysteine residues 402 and 437 are missing, while an extra cysteine residue is present at position 1273 (Figs. 3,4, and 6). From the derived amino acid sequence the calculated mass of the mature proteins is 165061 for MUGl and 162343 for MUG2. The predicted potential asparagineglycosylation sites amount to 11 in MUGl and only nine in MUG2; of those, eight are identically positioned in both MUG variants (Figs. 3, 4, and 7). Compared to human a2M (24), for which disulfide bonds and glycosylation sites were experimentally determined (25), the homology is striking (Figs. 6 and 7).
Expression of mRNA-Despite direct detection by Northern blotting of mRNA species of about 5 kb extracted from mouse liver and other tissues, we could not conclusively show the differential presence of specific mRNA of the different MUG variants. Cross-reaction with the abundant mouse a2M mRNA and with the other variants prevented unequivocal results. Therefore, we attempted to obtain this information by PCR amplification. Total RNA, isolated from mouse liver and also from uterus, was reverse-translated, and the bait regions of different MUG variants were amplified by the PCR method. Different sets of oligonucleotide primers were devised based on the nucleotide sequence information of the bait regions which are the most typical and diverse regions of these proteins. The results supported the evidence that the three different MUG variants were expressed in the liver, which substantiates their isolation from the cDNA library prepared from this tissue. Signals corresponding to the three different types of cDNA isolated were also found by hybridization of the PCR-amplified fragments with three different oligonucleotide primers, typical and specific for each of the bait regions of the three MUG variants. In all cases the variant MUGl was present as the predominant species. Additionaly, MUG2 and MUG3 mRNA was present in both tissues but in much lower concentration than MUGl mRNA.
Plasma Murinoglobulin: Protein Isolation and Sequencing-Murinoglobulin was isolated from fresh, heparinized mouse plasma obtained by cardiac puncture from anesthetised animals. Isolation was done by a simple procedure which includes polyethylene glycol precipitation, gel filtration, and hydrophobic interaction chromatography. A single protein species

C T G A A C M G C T T G A T A A A T A T~A G T C C T C G A~
FIG. 3. cDNA sequence and rived amino acid sequence MUG 1.
deof was obtained as judged from analytical gel filtration, rate electrophoresis in 8% native gels, SDS-polyacrylamide gel electrophoresis on gradient gels, and by NH2-terminal sequencing. By these criteria we isolated a protein of apparent molecular mass of 185 kDa in the reduced state (Fig. 8). The native protein was estimated a t about 200 kDa and to be of a1 electrophoretic mobility. These characteristics conform to those previously published for murinoglobulin (1, 2) and for the rat a113 (6, 9, 12).
The NHz-terminal sequencing of the purified protein yielded 30 cycles which, with the exception of an undetermined cysteine residue at position 21, matched the predicted sequence of MUGl from residues 28-57 (Fig. 9). The only difference in this stretch between the two MUG variants, glutamine or histidine at position 53, was clearly seen as a glutamine (26 pmol of approximately 100 pmol of protein initially loaded). Further analysis is needed however to assess the presence of trace amounts of the MUG2 variant. Thus far, we were unable to identify by biochemical techniques the equivalent protein in mouse plasma. A contaminating protein of apparent molecular mass of 180 kDa, initially thought of as a second MUG was excluded as such by NH2-terminal sequencing.' Analysis of the Aberrant Murinoglobulin-The cDNA sequence obtained from the partial clone designated MUG3 contained 3316 base pairs including, at the 3' end, both a poly(A) tail and a typical AATAAA canonical polyadenylation signal (Fig. 5). The sequence identity with both MUG variants is high but not complete. A frameshift mutation is obviously present at position 1461 (compare to MUG1, Fig. 3 position  2763) and is caused by the insertion of a guanine (Fig. 5). This induces a stop of translation in the next codon, disturbing an open reading frame, which prohibits efficient translation.
Two regions, nucleotides 1-255 and 624-798, are totally different from MUGl and MUG2 (Fig. 5). No sequence homology was retrieved from the EMBL or GenBank sequence databases consulted. Region 1-255 ends in a typical intron acceptor site, suggesting that this could be an intron which was not spliced. The characteristics of the divergence region 624-798, located in the putative bait region, indicated the possibility that this also might be an intron.
This hypothesis, that the two highly different regions were actually intron sequences, was confirmed by the isolation of a genomic clone coding for this cDNA. This genomic clone of  G  C  T  C  A  T  A  T  T  T  C  T  G  A  n~G  A  T  T~~C~C  A  G  A  C  A  T  G  G  L I ~C  A  T  G  T  T  A  C  T  T  T  T  T  C  T  A  T  T~T  G  G  G  I  U  U I G

AAAAM
13.5 kb, was digested with Sac1 and EcoRI (Fig. 10). After Also confirmed in this genomic clone is the frameshift subcloning into pUC18, the exons in the 5"terminal part of mutation (position 1461) which in itself strengthens the evithe clone were sequenced using synthetic oligonucleotides dence of identity of the genomic clone as the gene responsible based on cDNA sequence, to define the exon-intron bounda-for the aberrant cDNA retrieved in the first place. introns was determined by PCR amplification using the same clues are provided to the functioning of these inhibitors we oligonucleotide primers. Agarose gel electrophoresis of the opted to develop an animal model. In the the techamplified introns showed fragments of 1.2, 0.7, and 2.2 kb (Fig. 10). The sequence analysis confirms the presence of the niques of genetic manipulation have culminated into a tool presumed intron at the 5' end of the MUG3 cDNA (Fig. 10). by which it will be possible to change by specific mutations Moreover, the intron in the bait region is observed, but the function of a protein. Of further considerable importance interestingly, is buried in a larger intron. This was demon-is the PhYsiological behavior of mNX%? a2M, which is not strated by sequencing of the subcloned gene with specifically unlike human a2M (constitutively Present in comparable designed oligonucleotide primers, based on the MUG^ pre-concentrations, cross-reacting immunologically, similar prosumed intron cDNA sequence (Fig. 5, positions 661-676 and teinase inhibition spectrum). From this perspective we have 727-745 antisense). Apparently this part of the intervening embarked on the characterization at the molecular level of all sequence, which shows a typical splice acceptor site (26), but the members of this family of proteinase inhibitors, not only no consensus splice donor site, can behave as an exon under mouse a2M but also murinoglobulin, the single chain inhibitor certain conditions. previously described (1, 2). caagaatttcagggagccaaaatcatccttttgttgctttggagtttgactgacacattt cctgtcttcctaaatcaatcccaggggagaaagggggagcaaaggacttcagttgtgctt ctgacagtattcataatctggaaaaactgatgtgtgtgtagagatccagagtgactt~c gtggcaggtttcagggatgcaaaagcaggaacaccccac~tcccaagctaagct~att By screening a cDNA mouse liver library with a human a2M probe, we isolated not only mouse a2M clones (which is reported elsewhere) but also different variants of murinoglobulins. Unequivocal identification as murinoglobulins was based upon NH,-terminal sequencing of a murinoglobulin isolated from plasma and of the large and small subunits of mouse a2M and by comparison with the published cDNA sequence of alI3, the single chain inhibitor in the rat (14,15).

ttgcttctcacagGTGGACCTGAGCTTCAGCTCATCTCAAAGTCTTCCCTCCTCACAAAC V O L S F S S S Q S L P S S Q T CCGTCTGCAGGTCACAGCCTCTCCTCAGTCCCTCTGTWCTGAGAGCTGTGGACCAGAG R L Q V T A S P Q S L C G L R A V D Q S TGTGCTGCTCCTGAAGCCCGAGGCTGAGCTCTCCCC~CCTGGATATACAATCTGCCAGG V L L L K P E A E L S P S W I Y N L P G TATGCAGCACAACAAATTCTCCAAGTTCCCGTCTGTCTGAAGAC~GAAGACTGTAT M Q H N K F I P S S R L S E O R E O C I A C T G T A C A G G T T A T G G A T G T C T G A G A A~G A T A C T C A A
The differences between variants MUG1, MUG2, and the aberrant MUG3 are located predominantly in the bait region. one of the variants of rat a113 (14). Point mutations (base substitutions, deletions, and insertions) are present throughout the whole sequence, suggesting that the different cDNAs did not originate by alternative splicing of mRNA derived from one murinoglobulin gene. The more acceptable hypothesis is the existence of different genes. Southern blot analysis of mouse genomic DNA with a 350bp-amplified fragment of the putative bait region, known to react with all three MUG cDNA, indicated the existence of different MUG genes. Restriction of the genomic DNA with BamHI, EcoRI, and EcoRV revealed in each case four different bands, totaling between 25 and 30 kb. The existence of at least three different genes for murinoglobulins appears thereby very likely.
That MUGl and MUG2 are members of the aM-family is further evidenced by the presence of a typical thiol ester site, which is involved in covalent binding of the proteinase. In monomeric a M this site is an absolute necessity for the binding of proteinases by the formation of a covalent bond. This was shown by the nearly complete loss of proteinase binding in monomeric rat alI3, after amine-induced thiol ester cleavage (5,6). Tetrameric aMs do bind proteinases under the same conditions, because steric inhibition is more important in these molecules (4,7).
Including the cysteine residue present in the thiol ester site, 24 cysteine residues are positionally conserved in MUGl and MUG2. The same 24 cysteine residues are also conserved in the prototype rat alI3/2J (14). Human a2M of which the disulfide bridge structure was determined experimentally (3), and mouse a2M have essentially the same cysteine residue positions. It therefore seems likely that the same disulfide bridge pattern is present also in the murinoglobulins. Notable is the absence of cysteine residue 255 in MUGl and MUG2, as well as in rat a113. In human a2M this cysteine residue is known to take part in an interchain disulfide bridge, forming the covalent a2M dimeric subunit. In the monomeric MUG and a113 proteinase inhibitors there is clearly no structural reason to conserve this cysteine residue at this position. From comparison of the protein sequence predicted from the cDNA sequence with the one obtained from NH2-terminal sequencing of the isolated protein, a 27-amino acid signal peptide is observed, typical for secreted plasma proteins (27). Compared to the one in rat alI3, which is only 24 residues in length (14), the signal peptides display 77% sequence identity (21 out of 27 residues), which is comparable to the overall sequence identity. The primary structure of eukaryotic signal peptides displays three general domains: a positively charged n domain, a hydrophobic h domain, and a polar c domain (28,29). Three of the six differences between MUG and a113 are present in the short n domain.
The two variant murinoglobulins differ most extensively in their putative bait regions, where an amino acid sequence identity of only 56% is observed, as opposed to an overall protein sequence identity of 89%. It has been proposed that the divergence in this region, which is a general feature of the C. d t q C * q q t t t t~t t e C t q t d C C t C t q e . . C t t t~~~q t d . .~t d~C *ttdtttlt~Ct~C*dilt~tt..d9tttdittLtt.tl~CtqtqCt..C.t..I.t..d.CCt  a M family as observed in human a2M and pregnancy zone protein; rat a2M, alM, and two variants of alI3; and in mouse a2M and the MUG variants, could be due to positive Darwinian selection, thereby creating new regions for the attacking proteinases as postulated in other systems (30)(31)(32).

t a t t c r q a c t q 9~c t 9 t * t t a~C I L L c c t e a~r q . t t~T T C C U T * T C T T C * C I C C T I ? C T I ? A A G C I I ) C~~~~~~~.~~f f~f f f f~C~C~~~C~A ?~~~C C C G A A C T I ? C C C T G T I C C T G~~~~~~~~~~~C~~~~~~~~~C~C
Another explanation for divergence in this region is the neutralist theory, suggesting that the bait region is located in a loop which is not of crucial importance for the protein structure, so that mutations in this region are not selected against, as opposed to other parts of the protein (14,30). When comparing the inhibitor characteristics of mouse a2M and MUG (1, 2), as well as the two variants of a113 (6, 30), essentially the same inhibitory patterns are found. These findings make the second theory more acceptable. A combination of both theories is of course possible and must await verification by experiments.
For the third aberrant MUG no full-length, normal translatable transcript was obtained from two liver cDNA libraries. Isolation of a genomic clone corresponding to the gene responsible for the aberrant cDNA sequence of MUG3 did not conclusively settle the question as to the existence of a normal counterpart. The frameshift mutation which was found in the cDNA clone at position 1461, is also present in this genomic clone, located at an intron-exon boundary. The "intron" sequence (175 bp) which was present in the bait region of the aberrant cDNA clone and which we defined by comparison to the other MUG variants, was demonstrated to be present in the genomic clone as part of a much larger intron of 2.2 kb. The intron sequence is thus to be regarded as an exon, though it shows no open reading frame. These results suggest that the third aberrant MUG is derived from a transcriptionally active pseudogene which was partially isolated as the genomic clone. It is impossible to decide at this point in time if this pseudogene is complete and derived from the gene of murinoglobulins 1 or 2 or from a third active murinoglobulin gene for which we have no cDNA evidence yet. Analysis of the putative bait region of MUG3, leaving out the intron, and comparison to the bait regions of the "active" murinoglobulins 1 and 2, reveals the following (Fig. 11): of 46 comparable residues, 19 residues (41%) are positionally conserved in all three MUGS (including 5 prolines which are also seen in the same positions in the rat a113 variants), while 13 residues (28%) are uniquely present in MUGS. Of the remaining 14 residues, 5 amino acids (11%) are shared between MUGl and MUG3, while 9 (20%) are common between MUG3 and MUG2. Although such considerations do not take into account the chemical similarities of different amino acids nor relative importance in terms of proteinase substrate specificity, they do demonstrate conclusively that MUG3 is not significantly related to any of the other murinoglobulins. This would mean that a third active gene, still to be discovered, is the origin of the pseudogene or has an allele from which the aberrant cDNA clone, LZ52, was transcribed. Further investigation at the genomic level of different mouse strains and examination of expression in tissues other than liver is needed to definitely anwer these questions.
From an evolutionary standpoint, the comparison of these inhibitors at the molecular level in rat and mouse is interesting. The homology in these proteins is undeniable and important. Nevertheless we also note remarkable differences. First, in the mouse we have not yet encountered the equivalent of the spectacular acute phase reactant which is a2M in the rat. Second, whereas in the rat the a113 inhibitors are among the most prominent plasma proteins in quantity, in the mouse the level of murinoglobulin is much lower. We estimate its concentration under normal conditions at between 0.5 and 1 mg per ml plasma, which is of the same order as mouse a2M and more comparable to other members in other species. Allthough these differences could be only quantitative in nature, it will take knowledge of all the family members at the gene level to appreciate fully all the evolutionary implications. Besides the complete analysis of the genes involved in the now complex a M family, the problem of expression needs to be resolved. To understand the physiological need for such a set of slightly different proteinase inhibitors of the same type, we need information on their spatial and temporal expression in uiuo. This will in itself not only yield valuable information but is necessary if a useful transgenic model is to be constructed.