The polymorphic integumentary mucin B.1 from Xenopus laevis contains the short consensus repeat.

The frog integumentary mucin B.1 (FIM-B.1), discovered by molecular cloning, contains a cysteine-rich C-terminal domain which is homologous with von Willebrand factor. With the help of the polymerase chain reaction, we now characterize a contiguous region 5' to the von Willebrand factor domain containing the short consensus repeat typical of many proteins from the complement system. Multiple transcripts have been cloned, which originate from a single animal and differ by a variable number of tandem repeats (rep-33 sequences). These different transcripts probably originate solely from two genes and are generated presumably by alternative splicing of an huge array of functional cassettes. This model is supported by analysis of genomic FIM-B.1 sequences from Xenopus laevis. Here, rep-33 sequences are arranged in an interrupted array of individual units. Additionally, results of Southern analysis revealed genetic polymorphism between different animals which is predicted to be within the tandem repeats. A first investigation of the predicted mucins with the help of a specific antibody against a synthetic peptide determined the molecular mass of FIM-B.1 to greater than 200 kDa. Here again, genetic polymorphism between different animals is detected.


The nucleotide sequence(s) reported in this paper has been submitted to the GenBankTM/EMBL Data Bank with accession numbeds) M83682 and M83683.
$ To whom reprint requests should be addressed. (FIMs).' The first of these mucins detected by cDNA-cloning was FIM-A.l (Hoffmann, 1988). To our knowledge, it is the first entirely sequenced mucin described. It consists of a central repetitive threonine-rich domain (acidic type-A repeats), which is highly 0-glycosylated, and terminal cysteinerich P-domains. Expression of FIM-A.l, which is genetically polymorphic, occurs exclusively in mucous glands. Only cone cells (Spannhof, 1953(Spannhof, /1954Els and Henneberg, 1990) at the proximal pole of the glands do not synthesize FIMs (Hauser et al., 1990). By cDNA cloning we also discovered a second mucin, FIM-B.l (Probst et al., 1990). So far, this mucin has only been characterized partially. It contains acidic type-B repeats which are homologous with repetitive elements found in FIM- A.l. Located at the C-terminal end is a cysteine-rich domain showing sequence similarities with the C-terminal end of von Willebrand factor (vWF), with pro-collagen and thrombospondin. Recently, this module has also been detected at the C-terminal end of submaxillary mucins (Eckhardt et al., 1991). In vWF, this domain is responsible for dimerization (Voorberg et al., 1991), and a similar function could be expected in mucins. This would be in line with their general tendency to form disulfide-linked oligomers which seems to be necessary for the gel-forming capacity of mucins (Dekker and Strous, 1990). The molecular function of FIMs has not been established. Besides protecting the delicate skin of the frog (as typical of mucins in general; Neutra and Forstner, 1987) we also speculate on a specific function mainly caused by the characteristic cysteine-rich modules. One possibility would be the defense against specific microbial infections of the skin as previously proposed (Hauser et aZ., 1990).
In order to evaluate the question if the known FIM-B.l sequence (Probst et al., 1990) and the new short consensus repeat (SCR) containing sequence (pFIM-5'-21) belong to the same mRNA, a further polymerase chain reaction (PCR) was carried out using REP6 The abbreviations used are: FIMs, frog integumentary mucins; vWF, von Willebrand factor; SCR, short consensus repeat; PCR, polymerase chain reaction; bp, base pair(s).
Plasmid DNA was purified with Qiagen-tip 20 (Diagen) and sequencing of double-stranded DNA was accomplished with a Sequenase kit (version 2.0, U. S. Biochemicals) using [a-''S]dATP for labeling. Computerized analysis and homology searches have been described previously (Hauser and Hoffmann, 1991).
Isolation of Genomic Clams-In order to obtain genomic sequences encoding FIM-B.l, a library kindly provided by Dr. G. Spohr (Gengve) was screened. This library was constructed using partially HaeIIIdigested DNA from X . laeuis livers. Fragments were inserted into Charon 4A X phages via EcoRI linkers.
Southern Analysis-Genomic DNA was prepared from the liver of single animals according to Herrmann and Frischauf (1987). Alternatively, a DNA extraction kit (Stratagene) was applied. Portions of 30 pg of DNA were digested with various restriction enzymes and separated on a 0.7% agarose gel. Transfer to ImmobilonTM-N membrane, conditions for hybridization and washings were as recommended by the supplier (Millipore). For hybridization, two different restriction fragments of FIM-B.l clones labeled with 32P by random priming (Boehringer Mannheim) were used, uiz. the HindIII fragment of clone PREP-3'44 (corresponding to positions 736-1118;Probst et al., 1990) encoding part of the domain homologous with vWF, or the insert of pREP-5'-119, which consists only of tandemly arranged rep-33 sequences.
Production of Antiserum FIM-1 and Western Analysis-In order to obtain a specific antiserum against FIM-B.l, the synthetic peptide CADPGIPMYGKRNGSSFLHGDV (FIM-1, kindly provided by C. Hoffmann, Max-Planck-Institut fur Psychiatrie) was coupled through the cysteine of the peptide to keyhole limpet hemocyanin with m-maleimidobenzoyl-N-hydroxysuccinimide ester as the coupling reagent (Doolittle, 1986). Rabbits were immunized as previously described (Konigstorfer et al., 1989).

X. laeuis integumentary secretions were collected by carefully
scraping the skin of the animals, and Western blots were performed as reported previously (Konigstorfer et al., 1989) using a 1:200 dilution of the FIM-1 antiserum.
Enrichment of FIM-B.1-As described in the past (Hauser et al., 1990), the skin of a single adult X. laeuis was removed, cut with a scissors, and sonicated. After centrifugation, the supernatant was then additionally fractionated by chromatography on a Sephadex G-200 column (2.7 X 60 cm) at 4 "c using 50 mM NaCl, 50 mM Tris-HC1 (pH 7.0) as elution buffer. The elution profile was monitored by continuously measuring the absorption at 280 nm. Aliquots (20-50 pl) of each collected 5-ml fraction were subsequently analyzed on Western blots with the anti-FIM-B.l antiserum FIM-1 or the anti-FIM-A.l antiserum SPL-5 (Hauser et a i , 1990).
Translation in the open reading frame designates different domains: regions consisting predominantly of repetitive elements with the major motif GESTPAPSETT (type-B repeats; underlined in Fig. 1) and a cysteine-containing region, which separates the repetitive domains.
Partial Analysis of Genomic FIM-B.1 Sequences-Figs. 2 and 3 represent a 3.5-kilobase long region from the genomic clone XFIM-B-24 after partial digestion with HindIII. This genomic subclone GFIM-B-241H-Al) does not code for exactly the same region as is seen in the cDNA but is probably located somewhere 5' to the part shown in Fig. 1. Furthermore, the genomic sequence and the cDNA sequences originate from two different animals. The genomic sequence consists of four highly homologous HindIII fragments arranged in tandem. Each fragment is 919 bp long and has its own characteristics. As a hallmark, it contains a single rep-33 sequence (encoding a type-B repeat) surrounded by typical intron sequences according to Breathnach and Chambon (1981). Only the second HindIII fragment is shortened by a deletion of 169 bp which also eliminates most of the rep-33 sequence.
Striking are also four regions encoding the tripeptide VPS which occurs repeatedly in the cDNA (Fig. 1). Interestingly, only two of these regions can be considered as functional exons since they contain potential splice junctions (Breathnach and Chambon, 1981). Furthermore, preceding each VPSencoding region is always a 18-bp long sequence which has also been found on the cDNA level (encoding the peptide ATLKST, see Fig. 1). But in every case, functional exonintron boundaries are missing excluding the possibility of being used as exons.
Taken together, due to point mutations at potential intronexon borders, not all genomic regions homologous to cDNA sequences are accessible to splicing. Thus, many potential exon sequences have lost the ability to be functionally expressed.
Southern Analysis- Fig. 4 illustrates the analysis of the FIM-B.l gene from X. Zueuis. Hybridization of genomic DNA from three different individuals with a non-repetitive probe (Fig. 4A), which encodes part of the domain homologous with vWF (Probst et al., 1990), resulted in identical restriction patterns. In every digestion (EcoRI and HindIII) two or more bands are visible. Similar results were obtained with other restriction enzymes showing also mainly double bands (data not illustrated). This leads to the assumption that two genes for FIM-B.l exist per haploid genome. In contrast, use of a probe containing the rep-33 sequence (Fig. 4B) gives rise to multiple bands, which in contrast show polymorphism between the three different animals. Western Analysis-Since FIM-B.l has only been characterized on nucleic acid level, an antibody (antiserum FIM-1) was raised against a non-repetitive region deduced from cDNA cloning (see Fig. 1) and used to investigate X. laeuis skin secretions.
As shown in Fig. 5, skin secretions of five different animals (lanes a-e) show characteristic polymorphic patterns of major double bands of >200 kDa when immunostained with antiserum FIM-1. From the size of these bands, one could roughly estimate that the yet characterized parts of FIM-B.l ( Fig. 1 and Probst et al., 1990) represent probably more than 50% of the full-length sequence. The occurrence of double bands could be an indication for the existence of two genes for FIM-B.1. A closer look exhibits that nearly every one of those

S G E S T P A P S E T T A T L K S T V P S V P S G E S T P A P S E T T G E S T P A P S E T T
"" " " " " " _ " " " " " " " " " " " " " " " " " " " " **t****t """" c """"""""""""""""~ """"___""___".""""" 2 I I P S t I pFII4-6.2-13,14,11,15,17 -  et al., 1990) is shown. Within the variable region (from the third to the fifth row) asterisks were introduced to maximize homology (representing gaps), whereas bars indicate identical nucleic acid residues. Restriction sites, repetitive elements (GESTPAPSETT), and potential Nglycosylation sites are underlined. Cysteine residues are encircled. The sequence selected for the synthetic peptide (FIM-1) is indicated by the dotted line.  A ) or with the insert of pREP-5'-119 (rep-33 sequence, B ) . As size marker, a 1-kilobase ladder (Bethesda Research Laboratories) was used. Genomic DNA loaded in lanes b and c originates from the same individuals as the secretions investigated in Fig. 5 (lanes a and b,  respectively).  1 (lanes a-e) or the preimmune serum (lane j ) . Skin secretions in lunes a and b originate from the same animals as the genomic DNA investigated in Fig. 4 (lanes b and  c, respectively). Secretions from lane d were from the same individual used in Fig. 6 also.

E L R I I P P E V S T V A V P V T T G O I T P A V T T E H S T E E I L T L P P P V V G
animal from lane d did not stain with the preimmune serum

Based on various control exDeriments, it is very unlikely mRNAs from a Single Individual with Variable Number of Tandem Repeats-The astonishing variety of transcripts orig-
inating from a single animal (Fig. 1) is in line with the polydisperse nature of FIM-B.l mRNA observed after Northern analysis (Hoffmann, 1988;Probst et al., 1990). Probably only two FIM-B.l genes/haploid genome exist (Fig. 4A), and the question arises as to how these different FIM-B.l transcripts, which differ within a variable repetitive domain, are created. The existence of two different genes would also be in agreement with the pattern of major double bands in the Western analysis (Fig. 5 ) and the quasi-tetraploid genome of X . laeuis (Thi6baud and Fischberg, 1977;Kobe1 and Du Pasquier, 1986) resulting from a duplication of the genome about 30 million years ago (Bisbee et al., 1977). Similar results were obtained for other genes of X. laevis, e.g. vitellogenin (Wahli et al., 1979), pp60-src (Steele, 1985), cytokeratin no. 3 (Hoffmann et al., 1988), and insulin (Shuldiner et al., 1989). n that the different clones are just'artificially generated by PCR. For example, amplification of repetitive sequences using cDNA templates derived from single animals always resulted in polydisperse products. In contrast, from a single cDNA clone consisting mostly of rep-33 sequences a discrete band was obtained indicating that no deletions/insertions were introduced. At this stage, however, PCR or cloning artifacts can never be totally excluded.
Alternative splicing of an huge array of cassettes can serve as a model, explaining the origin of the variable repetitive part in FIM-B.1. Recently, a similar situation has been described for the membrane cofactor protein of the complement system (Post et al., 1991). In Fig. 1 sequences were arranged so that potential cassettes can easily be seen. But it should be mentioned that additional clones were isolated (e.g. pFIM-6.2-12; data not illustrated) proposing an even more complex pattern than shown in Fig. 1. Therefore, the gene structure is probably more complicated than one would expect from

-
has also been described for human tracheobronchial mucin cDNAs (Aubert et al., 1991). In this preliminary report, however, the number of genes is unknown. This still allows the possibility of the existence of a multigene family, similar as apo-polysialoglycoproteins from rainbow trout eggs (Sorimachi et al., 1988), from which the different transcripts originate.
The enormous complexity of the FIM-B.l gene can only be roughly estimated from the partial analysis presented in Figs. 2 and 3. Generally, single rep-33 sequences or units encoding the tripeptide VPS are separated by introns as predicted from @ the cDNA sequences (Fig. 1). Taken together, only about 3% of the genomic sequence presented in Fig. 3 has the potential of being used as exons. Thus, the organization of the FIM-B.l gene would be in agreement with the cassette model discussed before. Interestingly, this situation obviously differs @ significantly from that of the MUCl (Lancaster et al., 1990;Spicer et al., 1991) and MUC2 locus (Toribara et al., 1991) where tandem repeats are organized in clusters without being separated by introns.
Genetic Polymorphism between Different Animals-As de-@ scribed in the past for FIM-A.l (Hauser et al., 1990), each individual frog can be characterized on protein level by its specific pattern of FIM-B.l (Fig. 5 ) , and Southern analysis of the same animals shows this allelic variation at the DNA level  Fig. 1) with SCRs from various proteins. hCF-B, human complement factor B (Mole et al., 1984); hC2, human complement factor 2 (Bentley, 1986); h@C4b-biP, @ chain of human complement component C4b-binding protein (Hillarp and Dahlback, 1990); h&-GP, human plasma &-glycoprotein I (Lozier et al., 1984). The consecutive number of each selected SCR in the corresponding protein is given in parentheses. Identical amino acid residues when compared with FIM-B.l are enclosed in boxes.
Homologies-Analysis of the predicted protein sequence based on the FASTA program (Pearson and Lipman, 1988) clearly showed that the new cysteine-containing module in FIM-B.l (see Fig. 1) represents a so-called short consensus repeat (or sushi-structure). As outlined in Fig. 7, there is significant homology between this module in FIM-B.l and SCRs of various proteins notably in the conservation of 4 cysteine residues, a phenylalanine-tyrosine, a glycine, a tryptophan, and a proline residue (Reid et al., 1986). In analogy with previous reports (Lozier et al., 1984;Janatova et al., 1989), the 4 cysteines are presumably disulfide-bonded in the pattern Cys-l/Cys-3 and Cys-2/Cys-4. SCRs generally consist of about 60 amino acid residues, and they have been described in a variety of different proteins (Perkins et al., 1988). The vast majority of these proteins belong to the complement system and are characterized by their interaction with C3b or C4b. Also many non-complement proteins have been identified bearing SCRs (Ichinose et al., 1990;Bork, 1991;Hessing, 1991), e.g. &-glycoprotein, haptoglobin, interleukin-2 receptor, ELAM-1, the vaccinia virus 35-kDa secretory polypeptide, and the b-subunit of factor XIII. Remarkably, many of those are involved in adhesive processes. However, the precise function of SCRs is currently not understood on a molecular level, but generally they might represent protein-binding modules.
Possible Function-With respect to the hypothesis that FIM-B.l could help protect frog skin from microbial infections (Hauser et al., 1990), the occurrence of a SCR is challenging since SCRs are presumably involved in the adhesion process of various microbial pathogens to their host cells, eg. via complement receptor types 1 and 2 (CR1, CR2). These receptors mainly consist of SCRs arranged in a tandem array (Fearon and Ahearn, 1989). CR2, the natural receptor for the Epstein-Barr virus, and a surface constituent of Candida albicans, which is also similar to CR2 by virtue of its ability to bind C3d (Edwards et al., 1986;Linehan et al., 1988), are discussed in this respect. Furthermore, Saxena and Calderone (1990) have identified also a secretory C3d-binding protein of C. albicans, a pathogen which can cause mycosis of mucous epithelia preferentially in human and warm-blooded animals. So one could propose that the SCR in FIM-B.l could be capable of disturbing the adhesion of potential pathogens, e.g. by competitive inhibition. Interestingly, mucin from rabbit colon has been reported to inhibit binding of E. coli (Mack and Sherman, 1991). Alternatively, the SCR in FIM-B.1 would be ideal for interaction with the complement system similar as postulated for human salivary secretions (Boackle, 1991).
FIMs also may form a matrix with lectins (Bols et al., 1986), which then could prevent infections caused mainly by fungi since it has been shown that wheat germ agglutinin as well as a lectin from Helix pornatiu are antimycotic (Mirelman et ai., 1975;Ziemer et al., 1990). Such a matrix could additionally be stabilized by interaction of the cysteine-rich modules of FIMs (acting as "link modules," i.e. P-domains, vWF-homologous domain, and SCR).
Since microbial infection is a constant danger, it also would make sense that FIMs are continuously released from mucous glands (dependent upon the body temperature, Lillywhite, 1971) and not discharged upon strong stimulation like other antimicrobial peptides, since only such a mode ensures continuous protection of the animal. For the future, expression of FIMs in cultured cell lines could help elucidate their molecular function.