Mannose-binding proteins isolated from rat liver contain carbohydrate-recognition domains linked to collagenous tails. Complete primary structures and homology with pulmonary surfactant apoprotein.

Preparations of mannose-binding protein isolated from rat liver contain two distinct but homologous polypeptides. The complete primary structures of both of these polypeptides have been determined by sequencing of peptides derived from the proteins, isolation and sequencing of cDNAs for both proteins, and partial characterization of the gene for one of the proteins. Each polypeptide consists of three regions: (a) an NH2-terminal segment of 18-19 amino acids which is rich in cysteine and appears to be involved in the formation of interchain disulfide bonds which stabilize dimeric and trimeric forms of the protein, (b) a collagen-like domain consisting of 18-20 repeats of the sequence Gly-X-Y and containing 4-hydroxyproline residues in several of the Y positions, and (c) a COOH-terminal carbohydrate-binding domain of 148-150 amino acids. The sequences of the COOH-terminal domains are highly homologous to the sequence of the COOH-terminal carbohydrate-recognition portion of the chicken liver receptor for N-acetylglucosamine-terminated glycoproteins and the rat liver asialoglycoprotein receptor. Each protein is preceded by a cleaved, NH2-terminal signal sequence, consistent with the finding that this protein is found in serum as well as in the liver. The entire structure of the mannose-binding proteins is homologous to dog pulmonary surfactant apoprotein.

Proteins capable of recognizing specific oligosaccharides have been isolated from a number of different organs in a range of vertebrate species (1). These proteins fall into several categories. One group consists of integral membrane proteins which are involved in the intracellular routing of glycoproteins to specific destinations, particularly the lysosomes. Examples of these proteins include the mammalian hepatic receptor for asialoglycoproteins, the avian hepatic receptor for agalactoglycoproteins, and the ubiquitous mannose-phosphate receptors (1, 2). A second group of small, water-soluble galactosebinding proteins have been extracted from many different tissues; although they are not integral membrane proteins, *This work was supported by Grants GM30823 and AM20595 from the National Institutes of Health and Grant 82-A-127 from the Searle Scholars Program of the Chicago Community Trust. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "aduertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
4 To whom correspondence should be addressed Dept. of Biochemistry and Molecular Biology, University of Chicago, 920 East 58th Street, Chicago, IL 60637. their localization within the cytoplasm or outside the cells remains a matter of controversy (3,4). Finally, the serum amyloid protein, a precursor to the protein component of amyloid plaques and a member of the pentraxin group of serum proteins, specifically binds to galactose-containing polymers (5).
The primary structures of several of these carbohydratebinding proteins are known. Five membrane-bound glycoprotein receptors, including two distinct asialoglycoprotein receptors isolated from rat liver and two found in human liver, as well as the chicken agalactoglycoprotein receptor, are homologous to each other, particularly in a COOH-terminal domain of 130-150 amino acids which contains the carbohydratebinding activity (6)(7)(8). Each of these proteins is anchored to the cell membrane by a membrane-spanning stretch of hydrophobic amino acids and an NH2-terminal cytoplasmic tail (8). In contrast, the primary structures of mouse and human serum amyloid protein are highly homologous to each other and to the acute phase reactant referred to as C-reactive protein (9), but do not appear t o be structurally related to the membrane receptors.
A mannose-binding protein (MBPI) isolated from rabbit and rat liver was originally believed to be involved in uptake of mannose-containing glycoproteins by macrophages (9-12); however, the fact that this protein is found exclusively in the parenchymal cells of the liver rules out this function (13,14). A MBP was subsequently isolated from the serum of rabbits and rats as well as humans (15,16); the serum and liver MBPs appear to be identical based on binding specificity, pH and calcium ion dependence of binding, immunological crossreactivity, and identical mobilities on SDS-polyacrylamide electrophoresis gels.
The MBPs share several important characteristics with the membrane-bound receptors found in hepatocytes, such as calcium ion dependence and loss of binding activity below pH 6.0. On the other hand, the MBPs are water-soluble and can be extracted from liver in the absence of detergent, which is in contrast to the membrane-bound receptors. The presence of MBP in the circulation suggested that this protein might be more related to serum amyloid protein.
Structural analysis of MBP isolated from rat liver was undertaken to determine the degree to which this protein is related to the previously characterized carbohydrate-binding proteins. The results of this analysis reveal that there are two rat MBPs which are homologous to each other and to the carbohydrate-binding portions of the membrane receptors. The remainder of each polypeptide consists largely of colla-Structure of Mannose-binding Proteins 6879 gen-like sequences. Some possible functions for MBPs suggested by their structural organization are discussed.

Organization of M B P Polypeptides
Multiple Polypeptides in Preparations of MBP-MBP can be isolated by several different procedures. The central purification step is affinity chromatography either on an immobilized glycoprotein which contains a high proportion of mannose residues (such as yeast mannan or invertase) or on columns constructed directly with mannose coupled to agarose. In earlier experiments, MBP was extracted from an acetone powder of rat liver with Triton X-100 (9, 11). Subsequently, it was found that extraction with high salt alone releases MBP in good yield (10,13). The material isolated by all of these procedures appears to be the same: in each case, a polypeptide of M, = 29,000 is isolated in what appears to be homogeneous form. However, upon reduction and alkylation, the presence of two polypeptides with slightly different mobilities on SDS-gel electrophoresis is revealed (see Fig. l). This figure also demonstrates that the species which migrates faster (designated MBP-A) is enriched in a fraction which precipitates upon dialysis against water, and is essentially absent from the fraction which remains soluble. This provides a useful means of partially resolving the two different species of MBP. The two forms of MBP have been found in preparations of MBP in which either invertase-Sepharose or mannose-Sepharose was employed as the affinity resin after starting with either detergent or high salt extraction. In all cases, the more slowly migrating species (MBP-C) is the predominant form.
Direct evidence for the presence of two distinct polypeptides in the MBP preparations was obtained by sequence analysis of the two fractions shown in Fig. 1. The results, shown in Table 1, demonstrate that a unique sequence is obtained for the pure MBP-C fraction, whereas a double sequence is obtained when some MBP-A is present with the MBP-C. Because the MBP-C sequence can be determined by itself, the sequence of MBP-A can be derived from the double sequence of the mixed polypeptides. The two sequences, although distinct, show some homology (see below course of generating additional sequence information for MBP-A and C, a number of tryptic peptides were isolated by two-dimensional thin layer peptide mapping (Fig. 2) and analyzed by automated Edman degradation (Tables I1 and   111). Two of these peptides (T2 and T7) consist of repeats of the sequence Gly-X-Y, which is characteristic of the sequences found in portions of collagen which form triple helices (33). In addition, several of the Y positions are occupied by 4-hydroxyproline residues, another characteristic of collagenous sequences (33). The amino acid composition of the MBP preparations reveals a high proportion of glycine (9), as would be expected if it contains a collagen-like sequence. When a special analysis was conducted for 4-hydroxyproline, it was found to be present at 1.3 mol %. These findings are consistent with the suggestion that a portion of the MBP sequence forms a collagen-like triple helix. Such collagenous domains have been found in other noncollagen proteins, including complement protein Clq and the asymmetric forms of acetylcholinesterase (34, 35).
In order to localize the putative collagen-like portion of MBP, bacterial collagenase was used to probe for collagenlike sequences. This enzyme preferentially cleaves polypeptides after proline in the sequence Gly-X-Pro-Gly (36). The results of this digestion are shown in Fig. 3. Most of the MBP polypeptides are cleaved by collagenase, leaving a fragment (C-1) of apparent molecular weight 20,000. The absence of an additional polypeptide fragment visible on the gels suggests that roughly one-third of each M, = 29,000 polypeptide is degraded by collagenase. The C-1 fragment can be repurified by affinity chromatography on invertase-Sepharose (Fig. 3,  lane 3), which indicates that the mannose-binding domain of the MBP resides in the collagenase-resistant portion of the molecule.
Fragment C-1 was subjected to sequence analysis with the results shown in Table 111. The first 9 residues of the sequence consists of collagen-like Gly-X-Y repeats, following which the sequence diverges from this pattern. This sequence is completely unrelated to the sequence found at the NH2 terminus of either MBP-A or MBP-C. The yield of the fragment. indicates that it must be derived from the more abundant MBP-C. The absence of a second sequence derived from MBP-A probably reflects the low abundance of the MBP-A fragment. The sequence of intact MBP-A could be discerned  Structure of Mannose-binding Proteins only when this form of MBP was enriched by the differential precipitation procedure ( Fig. 1 and Table I), and was not detected when total MBP was subjected to sequence analysis (data not shown). Thus a collagenase-resistant fragment of MBP-A would not be detectable in this experiment. Taken together, these results suggest that the NHz-terminal one-third of at least MBP-C consists of a triple helical region of protein, and that the carbohydrate-binding portion of the receptor lies in the COOH-terminal two-thirds of the molecule.
Interchain Disulfide Bonds-Evidence for the multimeric nature of MBP was obtained by analysis of the protein under nonreducing conditions. An SDS-polyacrylamide gel of an unreduced preparation of MBP (Fig. 3, lane 4 ) reveals the presence of higher molecular weight bands with apparent molecular weights of 57,000 and 78,000. These bands appear to represent disulfide bond-stabilized dimers and trimers of the MBP polypeptides. Similar ratios of dimers and trimers have been observed by others (14). Gel filtration analysis of the MBP preparation suggests that the native molecule (M, = 194,000) consists of six polypeptide subunits (9), which would be consistent with the presence of either two disulfidelinked trimers or three disulfide-linked dimers. Although it is possible that the trimers are formed of MBP-C subunits and the dimers of MBP-A subunits, the staining intensities do not appear to be consistent with this interpretation, since the relative abundance of dimers compared to trimers is greater than the relative abundance of MBP-A compared to MBP-C ( Figs. 1 and 3).
The location of the cysteines involved in the disulfide bonds was determined by examining the mobility of the collagenaseresistant portion of MBP-C under nonreducing conditions (Fig. 3, lane 5). The mobility of this fragment is not significantly affected by reduction, indicating that the interchain disulfide bonds are not located within the carbohydrate-binding domain. This suggests that they are likely to be near the NHz-terminal end of the polypeptide, a conclusion which is consistent with the sequence results presented below.
Primary Structures of Two MBPs Isolation and Characterization of cDNA Clones-Oligonucleotide probes designed to be complementary to the MBP mRNA were used to screen rat liver cDNA libraries. In order to obtain a suitable sequence from which to design the probe, a contiguous stretch of noncollagenous amino acid sequence was desired. This was obtained by digesting the collagenaseresistant fragment with clostripain and isolating a large fragment ((21-1) from the resulting mixture (Fig. 4). When this fragment was sequenced, 29 residues were identified (Table  111). Two regions from which the mRNA sequence could be predicted with relatively little ambiguity were selected ( Fig.  5). At positions of %fold ambiguity, both possible sequences were synthesized by including a mixture of both bases in the coupling reaction. At positions of 4-fold ambiguity, T was selected based on our previously employed reasoning that this might allow an A/T or G/T base pair with some stability, or at worst C/T or T/T pairs which would not be extremely destabilizing (27). In probe 2, a G was placed opposite the first base of the single arginine codon, since C is most commonly found in this position. This made possible the synthesis of 29 base oligonucleotides, thus taking advantage of the high signal to noise ratio which can be obtained using such relatively large probes (37).
When a rat liver cDNA library prepared in phage XGTll (20) was screened using oligonucleotide probe 1, two positive plaques were detected out of 200,000 plaques screened. Upon plaque purification, one of these was found to hybridize to probe 2 as well as probe 1 and was selected for further analysis. Sequencing revealed that this clone contains a partial copy of the MBP-C mRNA (see Fig. 6). A restriction fragment which includes the sequence encoding the COOH-terminal 120 amino acids of the protein was used to screen a rat liver cDNA library constructed in a plasmid ~e c t o r .~ A single clone of MBP-C was obtained from this library out of 100,000 colonies screened. The complete sequence of the insert in this clone was established (Fig. 7), revealing that it includes the entire protein-coding portion of the MBP-C mRNA.
When the XGTll library was rescreened using the same restriction fragment probe, two categories of phage were detected. One group, which gave a stronger signal upon hybridization, consisted of partial cDNAs for MBP-C. A total of 15 plaques, representing six independent phage, were detected in 1,000,000 plaques screened. In addition, a group of three more weakly hybridizing plaques, representing two independent phage, was identified. The inserts from this group of phage were excised with EcoRI and ligated into plasmid pSP64 for further analysis. In all cases, sequence analysis revealed that these phage contain coding information for MBP-A. The longest insert, in plasmid pMHL-A, was completely sequenced, with the results shown in Fig. 8.
Comparison of cDNA and Protein Sequences-The sequences of the proteolytic fragments of MBP discussed in the preceding section are shown along with the amino acid sequences deduced from the cDNA sequences in Figs. 7 and 8. All of the tryptic fragments obtained can be accounted for in one of the two predicted sequences. This includes a number of double sequences in which two tryptic fragments were not resolved by peptide mapping and were simultaneously subjected to Edman degradation. Peptide T7 is found in both MBP-A and MBP-C. In addition, the NHz-terminal sequence of the collagenase-resistant portion of MHL-C (C-1) and the sequence of the large clostripain fragment used to generate the oligonucleotide probes (Cl-1) match exactly the sequences predicted from the cDNA. The calculated amino acid compositions of both MBPs are very similar to the composition determined for the mixture (10,13). The calculated molecular weights are 24,000 for MBP-C and 23,500 for MBP-A. These values are somewhat lower than the values estimated from gel electrophoresis. Two factors may account for this discrepancy: the collagen-like sequences may lead to anomalous mobility on the gels, and there may be some carbohydrate attached to the mature proteins (see below). Two differences between the protein and nucleic acid sequencing results were observed. As noted above, tryptic peptides T2 and T7 were found to contain several residues of hydroxyproline. These residues are encoded as proline residues in the cDNA. This is expected, since hydroxylation occurs as a post-translational modification (33). The amount of hydroxyproline in the MBP preparation suggests that additional proline residues in MBP-A and MBP-C are probably modified in this manner. Interestingly, 2 residues of peptide T2 which were not identified by protein sequencing were found to be lysine residues in the cDNA-derived sequence. It is possible that these residues may be hydroxylated, as is commonly the case in collagenous sequences (33). Although the carbohydrate composition of MBP has been found to be very low (13), the amounts detected leave open the possibility that glycosylated hydroxylysine residues might be present. There are no Asn-X-Ser(Thr) sequences in either MBP-A or C, so the small amount of carbohydrate present must be attached by an 0-glycosidic linkage (38). Attempts to

M. McPhaul, manuscript in preparation.
determine the amount of hydroxylysine present in the MBP preparation were unsuccessful.
Comparison of the NH2-terminal sequences of MBP-A and MBP-C with the sequences encoded by the cDNAs reveals that each protein is preceded by a leader peptide. In the case of MBP-C, this extension has the typical hydrophobic character of a signal sequence for directing the insertion of this protein through a microsomal membrane (39). The site of signal cleavage follows the general pattern of recognition seen in other signal peptides, since the residues at positions -1 and -3 (Ala and Val) are amino acids commonly found at these critical positions (40). The presence of an in-frame stop codon 20 nucleotides upstream from the indicated initiator methionine residue eliminates the possibility that the signal sequence is preceded by a further extension. It is noteworthy that this stop codon is immediately preceded by a methionine codon, indicating that in this case protein translation is not initiated at the first AUG sequence in the mRNA.
Characterization of MBP-A Signal Sequence by Sequencing of Genomic Clones-The longest available cDNA for MBP-A does not cover the entire protein-coding portion of the mRNA, so the NH2-terminal sequence of the initial translation product was determined by characterizing a portion of the gene for this protein. A rat genomic library was screened using a restriction fragment derived from the 5' end of plasmid pMBP-A (see Fig. 6). Several overlapping phage were isolated and characterized? The sequence of the relevant portion of the MBP-A gene is presented in Fig. 9. This sequence overlaps the 5' end of the cDNA sequence, and reveals that MBP-A is also preceded by a hydrophobic signal sequence. The absence of any potential splice acceptor sites (41) between the initiator methionine codon and the 5' end of the cDNA sequence insures that no intron interrupts this portion of the sequence, although a possible splice acceptor site is found 10 nucleotides upstream of the AUG codon.

Common Structural Features of MBP-A and MBP-C
The sequences of the two MBPs are striking homologous to each other throughout their length. As shown in Fig. 10, the two proteins can be aligned with only four gaps. In this alignment, the sequences of the mature proteins are 56% identical. As summarized in Fig. 11, the overall organization of the proteins is also identical. Each consists of an NHzterminal signal sequence which is removed from the mature protein, followed by a short segment (18-20 amino acid residues) rich in cysteine residues, then a collagenous domain of 53-59 residues, and finally a COOH-terminal noncollagenous domain of 148-150 amino acids. As expected from the small difference in mobility of the reduced and carboxymethylated proteins, MBP-C is slightly longer than MBP-A.
Signal Sequences-Since the MBPs are found in the circulation and inside microsomes as well as possibly at the cell surface (13, 15), the presence of signal sequences is an expected feature of the molecules. The signal sequences of the two MBPs are the least conserved portions of the molecules; however, each has the general features of NHz-terminal signal sequences found on almost all eukaryotic secretory proteins (39, 42). In addition to being hydrophobic in character, both signal sequences contain cysteine residues near the midpoint, This is a common, although not universal, characteristic of signal sequences (42). The sites of cleavage by signal peptidase conform to the general rule that the amino acid in position -1 tends to be a residue with a small side chain (alanine and serine in MBP-C and MBP-A, respectively). The lack of specific conservation of residues within the signal or at the cleavage boundary is typical of comparisons made between signal sequences in otherwise highly homologous proteins (42).
Interchain Disulfide Bonds-The presence of a number of cysteine residues in the short NHz-terminal noncollagenous segment of both proteins is consistent with the fact that the interchain disulfide bonds are removed when the MBPs are digested with collagenase. Since both disulfide-linked dimers and trimers can be observed when the gels are run under nonreducing conditions, at least two of the cysteines in this NHz-terminal domain must be involved in disulfide bonds. The pattern of multimer formation may be somewhat heterogeneous, since both dimers and trimers are seen. As discussed above, the relative abundance of these species is not consistent with the hypothesis that one represents MBP-C and the other Collagen-like Domains-Portions of the collagen-like domains of the two MBPs show extremely strong conservation. Of particular interest is the presence of a single, identical interruption in the Gly-X-Y repeat structure: the sequence Gly-Gln-Gly is found in a highly conserved portion of both collagenous domains. In all other portions of the collagen-like domains, the sequences resemble the triple helix-forming segments of collagen in that a large number of the X and Y positions are occupied by proline residues. In addition, the protein sequence work indicates that many of the proline residues in the Y positions are hydroxylated. The composition of 1.3 mol % 4-hydroxyproline indicates that approximately 3 residues are present per polypeptide chain. Since there are four prolines in Y positions in MBP-A and five in MBP-C, a significant fraction of the potentially hydroxylated residues must actually be derivatized. In both MBP-A and C, one of the prolines in a Y position directly precedes the irregularity in the Gly-X-Y repeat discussed above; this may prevent its hydroxylation.
Role of Multiple MBPs-The existence of two distinct proteins in preparations of MBP raises the question of whether these proteins have distinct biological functions. For example, it is possible that the serum form of MBP may be enriched in MBP-A compared with MBP-C; in this case, the MBP-A isolated from liver might represent "contamination" from circulating protein. Alternatively, the two MBPs may have distinguishable binding specificities. The availability of clones for the two proteins will make it possible to investigate this possibility using the methods of in vitro transcription and translation (43).
Comparison of MBPs with Other Mammalian Carbohydratebinding Proteins Identification of a Common Carbohydrate-recognition Domain-The collagenase digestion data demonstrate that the mannose-binding portion of MBP-C is located in the COOHterminal collagenase-resistant domain. Because of the homologous organization of the MBPs, the same is probably true of MBP-A as well. The carbohydrate-binding domains of the rat and chicken endocytic receptors for galactose-and N-acetylglucosamine-terminated glycoproteins have also been localized to the COOH-terminal portions of these molecules (8, 43). It was, therefore, of interest to compare the sequences of these carbohydrate-recognition domains. As shown in Fig. 10 can be aligned with either MBP by introducing five gaps to give 23% sequence identity, whereas the chicken receptor can be aligned with four gaps to show 32% identity. There are certain residues which are conserved in all of the carbohydrate-binding domains, particularly around some of the cysteine residues. These residues may be involved in disulfide bond f~rmation.~ These residues are also conserved in other membrane-bound glycoprotein receptors, such as the minor form of the rat asialoglycoprotein receptor (6) and the two forms of human liver asialoglycoprotein receptor (7). Overall,

5D.
Farrell, E. Hsueh, and K. Drickamer 19 residues in the carbohydrate-recognition domain are conserved in all these proteins. As summarized in Fig. 11, the COOH-terminal approximately 130 amino acid residues of these proteins represent a carbohydrate-recognition domain which has been combined, in the course of evolution, with several other different types 'of domains: the carbohydrate-recognition domain is attached to a collagenous domain in the MBPs, and to a membrane anchor in the case of the endocytic receptors. This represents a striking example of how conserved domains can be combined in the construction of proteins which serve distinct functions but which also share at least one characteristic activity.
In spite of the strong sequence conservation in the carbo- hydrate-recognition domains, it should be noted that each of these proteins has a distinct pattern of carbohydrate-binding specificity: the MBPs bind mannose and N-acetylglucosamine (IO), whereas the rat asialoglycoprotein receptor is specific for galactose and N-acetylgalactosamine, and the chicken endocytic receptor recognizes N-acetylglucosamine (1). It will require further investigation to determine which residues in the carbohydrate-recognition domain are responsible for conferring these distinct specificities on the binding domains.
Multimeric Structure-The presence of triple helices and disulfide-linked trimers indicates that the MBPs must be at least trimeric in structure. Sedimentation analysis yields a molecular weight of 200,000, which suggests that two trimers may be associated in a hexamer in the native molecule. The exact nature of the disulfide bonding pattern and the question of whether the two MBPs form independent or mixed hexamers will have to be resolved in future studies. In the light of the homology between MBPs and the chicken hepatic lectin discussed above, it is interesting that MBP is a hexamer, since the chicken protein has also recently been shown to be hexameric. 6 The clustering of binding sites may be a characteristic feature of receptors which have high affinity for oligosaccharides with multiple terminal sugars.
Alternative Modes of Biosynthesis-It has recently been shown that the membrane-spanning segment of the rat asialoglycoprotein receptor serves as an uncleaved, internal signal sequence which interacts with the signal recognition particle and directs this protein to be inserted into membranes in a transmembrane orientation with the COOH-terminal carbohydrate-recognition domain on the noncytoplasmic side of the membrane (44). In contrast, the MBPs are delivered to the noncytoplasmic side of the membrane by a more conventional cleaved, NH2-terminal signal sequence. This indicates that the folding of the homologous carbohydrate-recognition domains in these two proteins is not likely to depend on a specific insertion pattern. This is consistent with recent evidence that the carbohydrate-recognition domain of the rat asialoglycoprotein receptor is functional when it is directed to the lumen of microsomes by an NH2-terminal, cleaved signal sequence derived from preproinsulin (43). mains has parallels in other proteins which also have collagenlike regions. These include the asymmetric forms of acetylcholinesterase (35), complement protein Clq (34), and the apoprotein of pulmonary surfactant (45). The latter two examples are particularly informative. Each of these proteins consists of polypeptides which are linked by disulfide bonds that immediately precede collagenous domains at their NH2 termini, whereas the COOH-terminal domain of each is noncollagenous.
In addition to this overall organizational homology, the dog pulmonary surfactant shows amino acid sequence homology with the MBPs (see Fig. 10). Pulmonary surfactant apoprotein and MBP-A can be aligned, with only three gaps, to show 30% identity in sequence. Seven gaps must be introduced to achieve an equivalent degree of identity with MBP-C. The strong homology between the MBPs and dog pulmonary surfactant apoprotein suggests that these proteins may be functionally related to each other. It may be particularly significant that virtually all of the "invariant" residues in the carbohydrate-recognition domains of the four carbohydratebinding proteins shown in Fig. 10 are also found in the pulmonary surfactant apoprotein. This suggests that this protein may also have carbohydrate binding activity, a suggestion which can be tested experimentally. Alternatively, since the pulmonary surfactant apoprotein is known to have calcium ion-dependent lipid binding activity (46), it is possible that the binding sites for carbohydrates and phospholipids are structurally analogous. Finally, it is also interesting that the single interruption in the Gly-X-Y repeat pattern in all of these proteins falls in the same part of the collagenous domain, suggesting that this feature may have significance for the interaction of all of these proteins with a common effector protein.
The homology between the MBPs and complement Clq lies largely at the level of overall organization. Both of these proteins have short NH2-terminal domains which are involved in interchain disulfide bond formation, followed by triple helix-forming domains. In each case, the pattern of Gly-X-Y repeats is interrupted near the middle. There are, however, distinct differences. The collagen-like portion of Clq is somewhat longer (25)(26)(27) Gly-X-Y repeats) than the corresponding domains of the MBPs. Also, the NHz-terminal domains of the MBPs contain several cysteine residues, whereas the corresponding portions of Clq contain only 1. The COOHterminal domain of each of these proteins is involved in a ligand binding activity-mannose-terminated oligosaccharides in the case of the MBPs and the Fc portion of immunoglobulins in the case of Clq. Although none of the conserved residues shown in Fig. 10 is present in Clq, and there is no clear homology between the two COOH-terminal domains, it is nonetheless intriguing that the fixation of complement is dependent on glycosylation of the Fc domain of the immunoglobulin (47).
The analogies between the MBPs and pulmonary surfactant apoprotein and complement of Clq may provide some indication of the natural function of the MBPs. It is possible that both the MBPs and pulmonary surfactant apoprotein participate in a primitive form of immune response. This would be analogous to the proposed function of the C-reactive protein, which recognizes the capsular polysaccharide of certain bacteria (5).Thus the presence of mannose might be taken as a relatively nonspecific marker for a bacterial surface in order to induce an effector function analogous to that induced more specifically by Clq. The presence of MBP in the circulation is consistent with this possibility. The pulmonary sur-Structure of Manno factant protein might serve a similar protective function at the surface of the lung.
Relationship to Other Carbohydrate-binding Proteins in Serum-C-reactive protein and serum amyloid proteins are two members of the pentraxin family, a group of pentameric and decameric proteins which show strong sequence homology with each other and which have characteristic binding activities (48). Comparison of the sequences of the MBPs with the pentraxins reveals that, as in the case of Clq, the conserved residues shown in the carbohydrate-recognition domains of Fig. 10 are not found in the pentraxins. The pentraxins contain a pair of highly conserved cysteine residues which are linked in a disulfide bond; there is no homology in their placement or surrounding residues with the 4 conserved cysteine residues in the carbohydrate-binding proteins shown in Fig. 10. Thus the carbohydrate-recognition activity exhibited by some members of the pentraxin family has a structural basis distinct from the family of proteins which contains the MBPs. terminus of the invertebrate protein (see Fig. 11). The presence of homologous carbohydrate-recognition domains in proteins of species which diverged as long ago as flies and rats suggests that this domain originated relatively early in evolution. Interestingly, it has been speculated that the protein isolated from insect hemolymph may also be involved in a primitive immune response (50).
In summary, the structures of the MBPs reveal that these proteins are part of a group of proteins which share carbohydrate-recognition domains but which combine this carbohydrate-recognition capacity with quite different "effector" domains. The relationship between the structures of the MBPs and other proteins, such as complement protein Clq and pulmonary surfactant apoprotein, may serve as clues in the elucidation of the functions of each of these proteins. -collage-(puified from Clostridium hirtolyticum; Code CLSPA) was a Yean m n a n and invertase, and 4-hydroryprolim and naralim standards were obtained fmm

Si-
Manmac-Sephuose was prepared by coqling of mannose to Sephamae 68 with divinyl sulfone (17). Invertase-Scpharorc (1.5 mg proteidml resin) was p r e p a d by the CNBr previourly indicated 16,19). Restriction enzymes were from New England Biolabr T4 DNA method (18). previously d m i [6,19). concentration of fonnamide q s increased to 50% for prehybridization and )hybridization and Screening with rertrictfon fragment prober followed the same pmcedure except that the plates, then transferred to nitrocellube filters, replicated and fixed as previauriy the filters were washed at 65 C. The plasmid cDNA library was plated directly mto agar described (28).
Hybridization and washing were performed A described for the phage previously described (31). ELUTlON VOLUME (mL)

Flg. 2. U . 4 sepmap, of tqptk ppada of M 8 P & A wparation of MBP was
dialyzed exhaustively against water. and the resulting imluble material was reduced and carboxymethylatcd.
Thii fraction is enriched in MBP-A relative to the origiil preparation. Appr&mately 2 mg of chi s material was subjected to rrypsin digestion.