Multiple variants of the human lymphocyte homing receptor CD44 generated by insertions at a single site in the extracellular domain.

The human CD44 cell-surface glycoprotein participates in a wide variety of cell-cell interactions including lymphocyte homing and tumor metastasis. The CD44 antigen is known to display extensive size heterogeneity when compared between different tissue sources although the structural basis for this variation is not yet clear. Recently, two further isotypes in addition to the basic hemopoietic form of the CD44 antigen have been cloned and sequenced and these have been found to contain all or part of a 200-400-base pair insert within the extracellular domain, suggesting that the characteristic heterogeneity in the molecule may be generated by a mechanism of alternative splicing. We have obtained further evidence for alternative splicing, and we report here the cloning and sequencing of six different CD44 sequence variants from a variety of cell lines using a combination of expression cloning and the polymerase chain reaction. Comparison of these variants indicates that each is probably assembled by the insertion of five different exon units in tandem into a discrete site within the membrane proximal region of the extracellular domain. One of the variants contains an exon that shares extensive amino acid sequence homology with a recently described rat CD44 variant that mediates tumor metastasis. Another variant contains a new exon that encodes a tandem repeat of the consensus sequence SG for covalent modification with chondroitin sulfate and is expressed predominantly on mammary tumors. We suggest that a mechanism of alternative exon splicing generates much of the observed structural heterogeneity of CD44 and that the particular set of CD44 variants expressed in a single cell may represent a precise postal code directing the final destination of migrating cells and metastatic tumors.

The human CD44 cell-surface glycoprotein participates in a wide variety of cell-cell interactions including lymphocyte homing and tumor metastasis. The CD44 antigen is known to display extensive size heterogeneity when compared between different tissue sources although the structural basis for this variation is not yet clear. Recently, two further isotypes in addition to the basic hemopoietic form of the CD44 antigen have been cloned and sequenced and these have been found to contain all or part of a 200-400-base pair insert within the extracellular domain, suggesting that the characteristic heterogeneity in the molecule may be generated by a mechanism of alternative splicing.
We have obtained further evidence for alternative splicing, and we report here the cloning and sequencing of six different CD44 sequence variants from a variety of cell lines using a combination of expression cloning and the polymerase chain reaction. Comparison of these variants indicates that each is probably assembled by the insertion of five different exon units in tandem into a discrete site within the membrane proximal region of the extracellular domain. One of the variants contains an exon that shares extensive amino acid sequence homology with a recently described rat CD44 variant that mediates tumor metastasis. Another variant contains a new exon that encodes a tandem repeat of the consensus sequence SG for covalent modification with chondroitin sulfate and is expressed predominantly on mammary tumors.
We suggest that a mechanism of alternative exon splicing generates much of the observed structural heterogeneity of CD44 and that the particular set of CD44 variants expressed in a single cell may represent a precise postal code directing the find destination of migrating cells and metastatic tumors.
The human cell-surface glycoprotein CD44 was originally implicated as a "homing" receptor, directing the migration of recirculating lymphocytes across the specialized high endothelial venular membranes of specific lymph nodes and inflamed synovia (1,2). However expression of CD44 is not * This work was generously supported by grants from the Wellcome Trust The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "aduertiement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.

The nucleotide sequence(s) reported in thispaper has been submitted to the GenBankTM/EMBL Data Bank with accession number(s) M83324-MB3328.
$ To whom correspondence should be addressed.
3 Present address University of Southern California, Los Angeles,

CA.
restricted to lymphocytes but is found in a wide variety of tissues including the central nervous system, lung, epidermis, liver, and pancreas (3) in addition to cell types as diverse as ovarian carcinomas (4) and erythrocytes (5).
Recently, monoclonal antibodies to CD44 have been shown also to recognize both the major cell-surface receptor for hyaluronic acid (6,7) and an abundant surface glycoprotein initially characterized as an extracellular matrix receptor (ECMR 111) for collagen and fibronectin (8,9). Other CD44 monoclonal antibodies have been found to stimulate T-cell activation in vitro (10) or to block the development of lymphoid and myeloid precursors when added to in uitro bone marrow cultures (11). Data also link CD44 with a role in tumor metastasis (12), and increased expression of CD44 transcripts is found in many human tumors (13).
It is clear from the many immunochemical studies that CD44 is involved in a diverse range of biological functions. These same studies have also revealed a considerable diversity in the structure of CD44 molecules expressed on different cell types (3, 4, 14, 15). Three broad size categories of CD44 molecules have been identified in previous immunoprecipitation analyses of surface radioiodinated cells: 1) a predominant 80-90 kDa category that is expressed by most cells of hemopoietic origin (1,16), 2) an intermediate size category 110-160 kDa that includes forms expressed primarily on epithelial cells (4,14,15,17,18), and 3) a category that includes very large isoforms (>200 kDa) covalently modified by the addition of chondroitin sulfate (16,18). However, each of these size categories represents more than one species, and the true number of different CD44 variants may be very large.
Recently, cDNAs encoding the CD44 glycoprotein have been cloned from lymphoid tissues of the human (13,19), baboon (ZO), and mouse (21). In each case a single cDNA species was obtained that encoded an integral membrane protein with an NH2-terminal extracellular domain homologous to the cartilage-link and proteoglycan core proteins found in connective tissue. The protein product of the human CD44 cDNA was shown to migrate with an apparent molecular mass of 80-90 kDa, similar to that of the major species found on lymphoid tissues and has been termed the hemopoietic form (13,19). Two variants of the basic CD44 molecule from the human and one CD44 variant from the rat have been cloned and sequenced recently. The first of these, termed the epithelial form because of its predominant expression in human epithelial tissues and tumors encodes a protein that is identical to the hemopoietic form except for the presence of an additional 135 amino acids within the extracellular domain (17,18). The second human variant, reported during the preparation of this article encodes a protein that is identical to the epithelial form but which contains only the last 69 amino acids of the epithelial insert (22). The rat variant which was cloned from a rat metastatic pancreatic adenocarcinoma cDNA library encodes a protein with an additional 162 amino acids located at a similar site in the extracellular domain (12). Interestingly, this variant is expressed only in metastatic pancreatic and mammary tumors and confers metastatic growth characteristics when transfected into non-metastatic rat tumor cells (12).
These findings suggested that additional insertions might be found within the extracellular domain of other CD44 molecules, perhaps consisting of new sequence units. This was a particularly attractive hypothesis since it provided a potential explanation for the characteristic structural and functional diversity of the CD44 molecule.
We report here that additional sequence insertions indeed occur within the extracellular domain of CD44 and that these are restricted to a discrete site in the membrane proximal region. We describe the characterization of six different CD44 variants that are generated most likely by alternative splicing of five or more different exons, often inserted in tandem. Furthermore, three of the variants are new and have not been described previously. One shares extensive sequence homology with the rat "metastatic" CD44 variant (12) and another, containing a site for 0-glycosylation with chondroitin sulfate (23) appears to be expressed predominantly on human breast tumors. The potential diversity generated by the postulated mechanism of exon "shuffling" is discussed in relation to the role of CD44 in lymphocyte homing and tumor metastasis.
Monoclonal Antibodies-The CD44-specific monoclonal antibody 8B25 (IgG2a) used in the expression cloning of full-length CD44 variants A and B (see "Results") was originally submitted to the Tcell panel of the Third Human Leukocyte Antigen Workshop (24) but was not assigned to a recognized cluster. More recent experiments' have confirmed that the 8B25 antibody binds the same antigen as the CD44 reference monoclonals A3D8 (25) and H90 (26) but recognizes a distinct epitope expressed on peripheral blood lymphocytes that mediates a monocyte-dependent T-cell activation pathway. The CD44 monoclonals F-10-44-2 (27) and 33-3B3 (25) and the CD45 monoclonal UCHLl were kindly donated by Professor Andrew McMichael.
RNA and DNA Isolation and Purification-RNA was purified by extraction into guanidinium thiocyanate followed by either caesium chloride density gradient centrifugation (26) or phenol extraction/ ethanol precipitation at low pH (28). Genomic DNA was purified according to standard methods (29).
cDNA Libraries and Expression Cloning of CD44 cDNAs-Three different cDNA libraries were used for expression cloning. The first, constructed from phytohemagglutinin-stimulated peripheral blood lymphocytes in the plasmid expression vector pCDM8, has been described previously (30). Two additional cDNA libraries from the T-cell leukemia HPBALL and from large granular lymphocytes similarly constructed in the expression vector pCDM7 (31) were kindly donated by Dr. David Simmons, ICRF Cell Adhesion Laboratory, Oxford, United Kingdom.
For the expression cloning of CD44 cDNAs, the three cDNA libraries were mixed and transfected into 50% confluent COSl cell monolayers in 10-cm dishes (approximately 50 pg of plasmid DNA/ ' D. Olive, personal communication. lo7 cells) in the presence of DEAE-dextran (0.1 mg/ml) and chloroquine (10 gg/ml) for three subsequent rounds of screening with the 8B25 monoclonal antibody. Each round consisted of antibody panning (1/1000 diluted ascites) of the transfected cells followed by rescue of the plasmid using the procedure of Hirt (32) and retransfection into COS1 cells via spheroplast fusion. Each of these manipulations was carried out exactly as detailed originally by Seed and Aruffo (31). After the final round of screening, one-fifth of the rescued plasmid preparation was transformed into Escherichia coli MC1061-P3, and 10 colonies were selected for plasmid isolation by the alkaline lysis method. Each of the cloned plasmids was then retransfected into COSl cells which were screened for the expression of CD44 by indirect immunofluorescence microscopy (using the 8B25 monoclonal) as described previously (30). The procedure resulted in the isolation of two different CD44 variant cDNAs. The first corresponded to the previously described hemopoietic variant (13,19) while the second contained an additional 200-b~' insert within the region encoding the extracellular domain of the molecule. These variants are designated CD44A and CD44B in the text.
Amplification from RNA of CD44 Variant Sequences Using the Polymerase Chain Reaction-10 pg of total cellular RNA was used as template for the synthesis of cDNA in reactions that comprised 100 mM Tris-HC1 buffer, pH 8.3, together with 50 mM KC1,lO mM MgCL, 10 mM dithiothreitol, 35 units/ml RNase inhibitor (Promega Biotech, Madison, WI), 100 pg/ml oligo(dT) (12-18-mer), and 5 mM deoxyribonucleoside triphosphates. First strand syntheses were carried out for 1.5 h at 42 "C in the presence of avian myeloblastosis virus reverse transcriptase (Life Sciences, 1500 units/ml) and the products purified by three rounds of phenol extraction and ethanol precipitation. The cDNA products (one-twentieth of the total in each case) were then used as template for PCR in a reaction mix that comprised Tris-HC1 (10 mM, pH 8.3), KC1 (50 mM), MgCl, (1.5 mM), deoxyribonucleoside triphosphates (250 p~) , the appropriate 5' and 3' primers (4 WM, see below), and Taq I polymerase (50 units/ml).
The synthetic oligonucleotide PCR primers used in the amplification of CD44 variant inserts were AMPl (5' TCCCAGTATGACA- Cloning and Sequencing of Amplified CD44 Products in M13"PCR products were end repaired by the addition of 1 unit of DNA polymerase (Klenow fragment) direct to the reaction mixture which was reincubated at 37 "C for 30 min before size selection of individual products on 1.2% agarose gels containing ethidium bromide. DNA bands were recovered on DEAE paper prior to phosphorylation (500 units/ml polynucleotide kinase in 50 mM Tris-HC1 buffer, pH 7.6, 10 mM MgCl,, 5 mM dithiothreitol, and 100 p~ ATP, 37 "C, 30 min) and ligation into the SmaI site of M13mp8. After transformation into E. coli JM101, individual M13 clones were screened for CD44 variant inserts by direct amplification of toothpicked plaques using the PCR primers AMPl and AMP3 as described above. Selected clones were sequenced by the dideoxy chain termination method (33) using singlestranded phage as template in Sequenase-catalyzed reactions. Where necessary sequence compressions were resolved by substituting 7deaza dGTP for dGTP in the polymerase reaction (34).
Southern Blotting of Amplified CD44 Products-After electrophoresis on agarose gels, PCR products were transferred to Nitrocellulose membranes (Hybond-C extra, Amersham, United Kingdom) for hybridization with oligonucleotide probes using standard methods. Blots were hybridized with radiolabeled oligonucleotide probes (32P endlabeled in the presence of polynucleotide kinase, 10 units, 37 "C, 30 * The abbreviations used are: bp, base pair(s); PCR, polymerase chain reaction. min) for 1 h at 30 "C and washed (6 X SSC) for 5-15 min at 5 "C below the calculated T,,, for the oligonucleotide/DNA duplex. The oligonucleotide probes were P1 (5' TGTACATCAGTCACA-GACCTGCA 3') which is complementary to a sequence beginning 284 bp upstream of the insert site and P2 (5' GAGACCAAGACA-CATTC 3') which is complementary to a sequence immediately 3' of the insert site.

RESULTS
Expression Cloning of a Novel CD44 Isotype CD44B That Contains a 200-bp Insert-cDNAs encoding the human CD44 antigen were initially cloned from a mixture of cDNA libraries in the plasmid vectors pCDM7 and pCDM8 expressed transiently in COSl cells. The monoclonal antibody 8B25 ((25) see "Materials and Methods") was used for panning. This antibody immunoprecipitated the same range of CD44 isotypes as the reference CD44 monoclonals F10.44.2 and 33 3B3 when tested with a panel of different cell lines including the human plasmacytoma IM-9 and the colon carcinoma HT-29 (data not shown).' Two full-length CD44 clones CD44A and CD44B were obtained. CD44A corresponded to the original CD44 sequence referred to as the hemopoietic form (13,19). The protein products encoded by the CD44B and CD44A cDNAs were compared after transient expression in COSl cells and immunoprecipitation with the monoclonal antibody 8B25 (Fig. 1). The CD44B protein migrated with an apparent molecular mass (130 kDa) that was approximately 40 kDa larger than the CD44A protein (90 kDa), a difference that was much greater than would be expected from the size (0.73 kDa) of the insert alone. The CD44B protein was also immunoprecipitated from the Burkitts lymphoma B-cell line J Y and the plasmacytoma cell line IM-9 (not shown) but was not expressed in the T-cell leukemia HPBALL (see also the PCR products in Fig. 26, below). CD44B appears to be expressed Treatment of immunoprecipitated CD44B and CD44A (hemopoietic form) with glycopeptidase-F to remove N-linked sugar chains yielded products of approximately 100 and 70 kDa, respectively (Fig. l), confirming the likelihood that further covalent modifications such as 0-glycosylation or chondroitin sulfate addition were present in the insert.
PCR Amplification of cDNAs Encoding Multiple CD44 Zsotypes-We used the PCR both to study the tissue specificity of CD44B expression and to determine whether additional CD44 variants containing different inserts might exist. To achieve this we used a 5' primer (AMP1) located upstream of the CD44B insert site and a 3' primer located either just downstream of the site (AMPS) or within the CD44B insert itself (AMP3) to amplify from a panel of different hemopoietic and tumor cell lines known to express a heterogeneous mixture of different CD44 isotypes (see "Materials and Methods'' and Fig. 4 for the sequences and locations of the primers).
After electrophoresis and transfer to nitrocellulose the PCR products were detected by hybridization with the oligonucleotide probe P1 (see Fig. 4 and "Materials and Methods).
Amplification of the region spanning the primers AMP1 and AMP2 was predicted to generate a product of approximately 480 bp from cDNAs encoding the hemopoietic (CD44A) variant, and products of approximately 680 and 880 bp from cDNAs encoding the CD44B and epithelial variant inserts, respectively. As shown by the results in Fig. 2a, the pattern of products obtained was more complex. The hemopoietic form was the predominant product in virtually all of the cell lines surveyed, including the tumor cells, many of which were of epithelial origin. By contrast the hemopoietic variant transcript was not detected in the colon carcinoma HT-29 using the PCR (Fig. 2a), confirming earlier reports that this particular isoform is not expressed on this cell line (14). No CD44 transcripts were detected in cDNAs prepared from thyroid tissue which was used as a negative control in these experiments because of the absence of CD44 expression on thyroid follicular cells (25). Interestingly, several other minor products were amplified from many of the cell lines that did not containing different inserted sequences were amplified from cDNA templates prepared from the panel of different cell lines shown by means of PCR using either of two different primer pairs (see "Materials and Methods" that spanned the postulated variable insert site (see text). In panel A the PCR primers used were AMPl and AMPZ, and in panel B the primers were AMPl and AMPB. The products were detected by autoradiography after electrophoresis on a 1.2% agarose gel followed by transfer to nitrocellulose membrane and hybridization with the "P-labeled oligonucleotide probe P1 complementary to a common sequence upstream of the insert site (see "Materials and Methods"). The migration positions of products representing the CD44B and CD44D (epithelial variant) inserts were determined using previously characterized controls that were run in parallel lanes. appear to correspond to either the CD44B or the epithelial insert products (Fig. 2a).
Parallel amplification of these same cDNA templates using a different 3' primer (AMPS) designed to amplify cDNAs containing the CD44B insert is shown in Fig. 2b. Under these conditions the hemopoietic variant transcripts were excluded and other less abundant transcripts that represented potentially new variants were amplified efficiently.
The predicted product sizes in this case were approximately 420 bp for the CD44B variant and 620 bp for the epithelial variant. Once again the pattern of products obtained was complex, confirming our earlier results with the AMP2 3' primer. At least six different products ranging in size from 420 bp to over 900 bp were visible, and these appeared to be differentially expressed among the cell lines tested (Fig. 2b). CD44B was the predominant product amplified from the hemopoietic cell lines but was also present in metastatic breast tumor tissue, the colon carcinoma HT-29, and the bladder tumor EJ. Some of the hemopoietic cell lines (e.g. peripheral blood lymphocytes, large granular lymphocytes, and the erythroleukemic cell line K562) yielded only the CD44B product whereas others, particularly the metastatic mammary tumor cell line MDA-231 appeared to contain a mixture of all of the different sized products (see Fig. 2b).
Our interpretation of these results was that transcripts encoding multiple different CD44 variants were present in many cell types and that these variants were assembled by tandem insertions of different sequence "cassettes." PCR amplification of the region spanning exon 1 and the 3' breakpoint of the insert site using the AMP4 and AMP2 primers yielded only the predicted 235-bp product from the three representative cDNA templates tested (Fig. 3) suggesting that no further sequence units were inserted 3' to exon 1 in any of the CD44 variants described here.

The Multiple CD44 Transcripts Are Generated by Tandem Insertion of Individual Sequence Cassettes into a
Discrete Insert Site-In order to define the boundaries of the putative insert site and to characterize the inserted sequences, the PCR products generated with the AMPl and AMP3 primers were cloned and sequenced in M13. Five different variant inserts, termed B-F, were subsequently defined (Figs. 4 and   5). Three of the variants (C, E, and F) were new and have not been described previously. Variant B corresponded to the CD44B isotype isolated earlier in this study by expression cloning (and recently by Dougherty et al. (22)), and variant D was apparently identical to the epithelial form of CD44 isolated by Stamenkovic et al. (17) except for a single GC inversion (position -5 of the sequences in Fig. 5) that converts the arginine in the published sequence to an alanine residue. Interestingly, the sequence of the epithelial isoform isolated recently from human keratinocytes (18) also contains an alanine residue at this position.
Comparison of the inserts from each of the five cloned variants allowed the identification of at least five individual sequence cassettes (Fig. 4). The simplest of the insert variants (CD44B) contained only cassette 1 while the most complex (variants E and F) each contained four cassettes arranged in tandem (Fig. 4). Furthermore, the insert in variant C was identical to the insert within the epithelial form (variant D) except for the absence of the first 35 amino acids from NH, terminus ( Fig. 5 and references). The most plausible explanation for these findings is that the sequence inserts or cassettes described here represent individual exons that are alternatively spliced to generate different CD44 isotypes. This explanation is further supported by the occurrence of sequences conforming closely to the conserved exon splicedonor consensus (C3S%/A39%) %Z%G77% at the 3' ends of many of the putative exons and by the fact that each of the inserts maintains the CD44 open reading frame (see Fig. 5). In the case of both exons 3 and 5 (variants C and F, respectively) the 3' splice site occurs within a codon and changes the identity of the encoded amino acid. This provides an explanation for the substitution of aspartic acid for asparagine at residue +45 of the variant F insert and the substitution of lysine for glutamine at residue +1 of the variant C insert (Fig.   5).
Each of the five putative exons contained a high proportion of serine and threonine residues  and with the exception of exon 5 (variant F) all contained the tripeptide motifs that mark potential 0-glycosylation sites (Fig. 5). By contrast only exon 1 contained the consensus N X (S/T) sequence for N-glycosylation. Interestingly exon 4 also contained a copy of the SGXG motif (23) that represents the potential site for 0-glycosylation with chondroitin sulfate. The minimal consensus motif SG is also present at the COOH terminus of exon 1 (Fig. 5 ) that was originally identified in the epithelial form of CD44 (17). The effects of inserting these highly glycosylated sequences on the biological functions of CD44 are not clear. However, recent evidence indicates that in the case of the epithelial form of CD44 (CD44D) the presence of the 135-amino-acid insert (defined here as exons 1-3) decreases the affinity of the molecule for hyaluronic acid (7). It is also significant that the insert site defined in the present study is located within the same region of the extracellular domain that appears to be modified by chondroitin sulfate addition and by conventional 0-glycosylation (13,19).

139GGGTCCTTTGGAGTTACTGCAGTTACTGTTGGAGATTCCARCTCTARTGTCRRTCGTTCCTTATCAGG~ACCL4GAC
217ACA 7311 I n s e r t c 7

I J G S T T L L E G Y T S H Y P H T K E 39GGCTCARCTACTTTACTGGRGGTTATACCTCTCATTACCCACACACGRRGGAA
I n s e r t D

3 R R D P m E G S T T L L E G Y T S H Y P H T K E ~~AGAAGAGACCCPAATCATTCTGAAGGCTCRRCTACTTTACTGGRRGGTTATACCTCTCATTACCCACACACGAAGG~
I n s e r t E v .

2 5 T T L L E G Y T S H Y P H T K E 40ACTACTTTACTGGMGGTTATACCTCTCATTACCCACACACGAAGGL4
I n s e r t F

AGTTCCTGGACTGATTTCTTCRRCCRATCTCTCACACCCCATGGGACGAGGTCATCRRGCAGGAAGL~GGATG~G
139 GACTCCAGTCATAGTACAACGCTTCAGCCTACTGCAAATCCARACACAGGTTTGGTGGRAGATTTGGACAGGACAGGA We have examined the remainder of the CD44 sequence for further insertions both 5' and 3' of the "membrane-proximal" site in a large panel of different tissues using PCR (AMP5, AMP6, AMP7, and AMP8 primers, see "Materials and Methods"). With the exception of the splice site within the cytoplasmic tail described by Goldstein et al. (35), no other variable insert sites were detected (not shown).
One of the Exons Represents the Human Homologue of an Exon Present within the Rat Metastatic CD44 Variant-None of the five variant exon sequences was found to be homologous with any sequence in the Genbank database. However, comparison with the cDNA encoding the recently described rat CD44 variant that has been shown to play a causal role in pancreatic and mammary tumor metastasis (12) (see also "Discussion") revealed a striking similarity with the sequence of exon 5 (Fig. 6). Furthermore, the region of homology which we identified was located within the 162-amino-acid insert of the rat CD44 that appears to be required for the metastatic growth promoting properties of the molecule (12). Altogether 29 out of 44 amino acids in exon 5 were identical to sequence at the COOH terminus of the rat CD44 (see Fig. 6 and Ref. 12). This result suggests that CD44 variants containing exon 5 may also play a role in human tumor metastasis.
We have not yet identified sequences homologous to the NH2-terminal portion of the rat metastatic variant insert although it is highly likely that additional exons containing these sequences will be found in other human CD44 variants. One possibility is that such exons are only rarely spliced into transcripts containing exon 1 (e.g. the major rat variant contains no additional insertions besides the metastatic exons (12)) and thus would not have been amplified efficiently by the PCR primers used in our strategy.
The usage of exons 4 and 5 in transcripts from different tissues was investigated by probing nitrocellulose blots of CD44 PCR products (AMP1 and AMP3 primers) with oligonucleotides specific for the individual exons. The results (Fig.  7) show that in the case of the exon 4 probe PEX4 (see "Materials and Methods") three bands of 560, 660, and 760 bp were detected specifically in metastatic mammary tumor cell transcripts (Fig. 7a). Interestingly, the two smaller products were present only in the metastatic mammary tumor line MDA-231 (Fig. 7a). These results were confirmed using a second probe, complementary to a different part of the exon 4 sequence (data not shown). In contrast, the exon 5 probe PEX5 (see "Materials and Methods") hybridized with transcripts present in each of the tumor cell samples (with the exception of the melanoma, see Fig. 76) in addition to the Blymphoblastoid cell line JY and the monocyte cell line U-937 (Fig. 76). In this case up to five transcript sizes were detected, ranging in size from 660 to 1200 bp, and different combinations of these were present in each cell line.
Curiously, the transcript levels appeared to be much higher in the metastatic mammary tumor line MDA-231 than in any of the other tumors. This overall pattern of exon 5 usage was similar to that reported recently for the full rat metastatic variant insert (12). Interestingly, this rat CD44 insert was also reported to be present in a variety of different sized transcripts, and these were expressed as different sized protein products.

DISCUSSION
The results presented in this study provide strong support for the hypothesis that the structural and functional heterogeneity which characterizes the CD44 molecule (3,4,14,15) is generated by a mechanism of alternative splicing. The splicing appears to be confined to a single insert site in the T h e figure shows the autoradiographic images of nitrocellulose blots containing the PCR products (AMP1 and AMP3 primers) amplified from the various cDNAs shown after hybridization with the '"Plabeled probes PEX4 (panel a) or PEX5 (panel b ) that specify exons 4 and 5, respectively. extracellular domain of the molecule located within a highly glycosylated region that lies outside the hyaluronic acidbinding site. The insertion of multiple different exons into this single site in CD44 either alone or in tandem should allow a considerable number of sequence variants to be generated. Utilizing only the five putative exons described here, there is the potential for 26 different variants, limited only by the order in which the putative exons would be spliced. This number probably represents an underestimate of the true diversity of CD44 isotypes since it is likely that additional exons will be identified based on the number of different sized products amplified from cDNAs in our PCR strategy.
A further level of diversity would also be generated by the particular combination of different CD44 variants expressed on individual cells. The potential for such diversity is clearly present in the cell lines tested but remains to be confirmed at the single cell level by PCR using the RNA from individual cells as template.
Mechanisms of alternative splicing are also used to generate diversity in the extracellular domains of other cell surface glycoproteins including fibronectin (36), the neural cell adhesion molecule (NCAM) (37), and the leukocyte common antigen CD45 (38, 39). Unlike CD44, both fibronectin and NCAM contain a t least two major extracellular splice sites and the various combinations of exons allow for between 12 and 24 potential protein isoforms, respectively. Splicing of the CD44 is more similar to that found in CD45, where eight isoforms are generated from a single gene by alternative use of only three variable exons within a single extracellular splice site. However, the number of alternative exons and hence the extent of the potential diversity in CD44 are clearly much greater.
Some general patterns for the tissue specificity of the CD44 variants were evident from our studies. For example the socalled hemopoietic variant of the CD44 (CD44A) was found to be expressed on virtually all the cell lines surveyed including most of the tumor cells. By contrast expression of the CD44B variant although present in most of the hemopoietic tissues was absent from many of the tumor cell lines. The other new CD44 variants that contained exons 4 and 5 showed differing degrees of tissue specificity. Our initial data on exon 4 suggests a relatively restricted usage of this element on mammary tumor tissue. Exon 5 which is homologous to part of the rat metastatic variant insert (12) was particularly abundant in transcripts from mammary tumors but unlike exon 4 was present also in the other tumor types in addition to macrophages (U-937) and some B-cells (JY). This particular pattern of expression is quite similar to that reported for the rat metastatic CD44 variant, and it is tempting to speculate that CD44 molecules containing exon 5 may play an analogous role in human tumor metastasis.
The splicing of exons into the extracellular site described in the present study may subserve a number of different functions. One interesting possibility is that the spliced exons could modulate the ligand binding properties of the different variants. Indeed, it has already been demonstrated that the presence of the 135-amino-acid insert within the membrane proximal region of the epithelial variant radically reduces the capacity for hyaluronic acid binding that resides in the NH2terminal domain of the molecule (7). Furthermore, in the rat a 162-amino-acid insertion at a similar site of the molecule generates a variant of the CD44 molecule that confers metastatic growth potential when overexpressed in non-metastatic tumor cells (12). Significantly, the metastatic growth potential of this rat CD44 variant can be reversibly blocked by a monoclonal antibody that binds to an epitope within the spliced insert (40). The introduction of new exons into the membrane proximal insert site of the human CD44 molecule may also have significant effects on the trafficking of lymphocytes between the bloodstream and lymph nodes. For example, it is interesting that the insert site is located within the region of the CD44 molecule that contains the epitope recognized by the monoclonal antibody Hermes 3 (19). This antibody blocks the binding of lymphocytes expressing the gp90he'"'" (CD44) antigen to the high endothelium of mucosal lymph nodes in vitro (Z), an interaction that appears to involve recognition of CD44 by a 58-66 kDa glycoprotein termed the mucosal vascular addressin (3).
Each of these results suggests that the membrane proximal region of the CD44 molecule plays a critical role in determining its ligand binding properties. The introduction of new exons into this region could modulate adhesiveness either by inducing a conformational change in the hyaluronic acidbinding site or by generating entirely new binding sites for ligands that have yet to be identified. The particular choice of exon usage in an individual cell may be triggered by signals in the surrounding environment during lymphocyte maturation, and this could provide a structural basis for the specificity of lymphocyte homing. Moreover, the particular set of CD44 variants expressed on any one cell may represent a precise "postal code" directing the final destination of migrating cells as well as dictating the metastatic potential of human tumors. The potential for diversity revealed within this single site of the extracellular domain of the CD44 provides the structural basis for the specificity that may underly these important processes.