Structure of the human gene encoding granule membrane protein-140, a member of the selectin family of adhesion receptors for leukocytes.

GMP-140, an inducible granule membrane protein of platelets and endothelial cells, is a member of the selectin family of cell surface receptors that mediate interactions of leukocytes with the blood vessel wall. These molecules all contain an N-terminal lectin-like domain, followed by an epidermal growth factor-like domain, a variable number of consensus repeats related to those in complement-binding proteins, a transmembrane domain, and a cytoplasmic tail. Two variant cDNAs for GMP-140 have been identified, one predicting a soluble form of the molecule lacking the transmembrane domain and the other predicting a molecule containing eight instead of nine consensus repeats. Here we describe the organization of the human gene encoding GMP-140, which spans over 50 kilobase pairs and contains 17 exons. Almost all exons encode distinct structural domains, including the lectin-like domain, the epidermal growth factor-like domain, each of the nine consensus repeats, and the transmembrane region. Each of the two deletions found in the variant cDNAs is precisely encoded by an exon, suggesting that these forms of GMP-140 are derived from alternative splicing of mRNA. By using the polymerase chain reaction, transcripts encoding the putative soluble form of GMP-140 can be amplified from both platelet and endothelial cell RNA. The structure of the GMP-140 gene supports the concept that the selectins evolved as a result of exon duplication and rearrangement.

Each of the two deletions found in the variant cDNAs is precisely encoded by an exon, suggesting that these forms of  are derived from alternative splicing of mRNA. By using the polymerase chain reaction, transcripts encoding the putative soluble form of  can be amplified from both platelet and endothelial cell RNA. The structure of the GMP-140 gene supports the concept that the selectins evolved as a result of exon duplication and rearrangement.
The selectins are structurally related cell surface receptors that mediate interactions of leukocytes with the blood vessel wall (l-4). Three members of this family are currently known. The first is the murine Mel 14 antigen (3,4) or human LAM-1 (Leu-8/TQl antigen) (5-8). This molecule, which is found on neutrophils, monocytes, and a subset of lymphocytes, mediates homing of lymphocytes to high endothelial venules of peripheral lymph nodes and may also promote leukocyte adhesion to endothelium at inflammatory foci (9, 10). The second selectin is ELAM-1, a human cytokine-inducible endothelial cell receptor for neutrophils (2). The third is GMP-1401 (PADGEM protein, CD62), a membrane glycoprotein located in secretory granules of human platelets and endothelium (1). When these cells are activated by agonists such as thrombin, GMP-140 is rapidly redistributed to the cell surface (11)(12)(13)(14) where it mediates adhesion of neutrophils and monocytes (15)(16)(17).
Each of the selectins contains an N-terminal domain homologous to Ca*+-dependent lectins, followed by an epidermal growth factor (EGF)-like domain, a variable number of repeating units similar to those in complement-binding proteins, a transmembrane segment, and a short cytoplasmic tail (l-4). The extensive sequence identity and shared domain organization of the selectins suggest that they comprise a gene family produced by duplication and rearrangement of ancestral exons. This hypothesis is strengthened by the tight clustering of all three genes on chromosome 1 in both mouse and man (18) The GMP-140 cDNA sequence predicts a protein composed of a number of discrete modular elements (1). Analysis of the genomic organization indicates that individual exons tend to encode each of these domains, as illustrated schematically in Fig. 2. Exon 1 ends with the published 38 bp of 5'-untranslated cDNA sequence followed by the ATG codon for the methionine residue that initiates translation (1). A large intron of 11.3 kilobase pairs separates this exon from exon 2, which encodes the first 30 amino acids of the signal peptide. Exon 3 encodes the remaining 10 amino acids of the signal peptide joined to the sequence encoding the lectin domain. Following exon 3 is another large intron spanning at least 13.1 kilobase pairs, whose entire length has not been determined.
Next are a series of exons that encode the EGF domain and each of the nine consensus repeats. Exon 14 encodes 40 amino acids surrounding the 24-residue hydrophobic transmembrane domain, including the first seven charged amino acids of the cytoplasmic segment. Two additional exons encode the remaining 28 amino acids of the cytoplasmic domain. Finally, exon 17 begins after the stop codon and contains all of the 3'-untranslated sequence. All but two of the intron-exon splice junctions are split between the first and second nucleotides of codons (phase 1). The exceptions are exon 1, which ends after the three bases of the ATG codon for the initiating methionine (phase 0), and exon 15, which encodes the first part of the cytoplasmic tail and ends after the first two bases of a codon (phase 2). gene were isolated and overlapping genomic DNA inserts were subcloned into plasmids for analysis as described under "Experimental Procedures." The sequence of the coding regions of the GMP-140 gene matched that determined previously from endothelial cell cDNAs (1). Five of eight polymorphisms previously noted in the cDNAs were also observed in the genomic sequence, as well as a new polymorphism in which C replaces A at base 2304. This nucleotide change results in a codon encoding a proline instead of a threonine in the ninth repeat of the molecule. All nine polymorphisms thus far identified are in sequence encoding the consensus repeats.
Variant Forms of GMP-140 Generated by Alternative Splicing-We previously identified two in-frame deletions in certain independent GMP-140 cDNA clones that predicted alternative forms of the molecule (1). One was a 186-bp deletion encoding 62 amino acids in the seventh consensus repeat, predicting a molecule containing eight instead of nine repeats. The other was a 120-bp deletion encoding 40 amino acids surrounding the transmembrane domain, predicting a soluble form of GMP-140.
Genomic analysis indicates that exon 11 encodes the 186-bp deletion and exon 14 encodes the 120-bp deletion. Therefore the variant forms predicted from cDNA sequencing are most likely derived from alternative splicing of precursor RNA transcripts, as illustrated in Fig. 3A. The different cDNAs encoding GMP-140 were isolated from a human endothelial cell library (1). To determine whether similar transcripts could be generated by megakaryocytes, we used the polymerase chain reaction to amplify the regions surrounding the seventh repeat or the transmembrane domain from both platelet and endothelial cell RNA. Fig. 3B demonstrates that two products of 357 and 237 bp were amplified from platelet RNA using primers from the exon encoding the ninth consensus repeat and the exon encoding the second portion of the cytoplasmic tail. Products of the same size were also amplified from endothelial RNA (not shown). The size difference of 120 bp between the two amplified fragments was consistent with the predicted deletion of the 120-bp transmembrane exon in the smaller product. To confirm this, the fragment was gel purified and sequenced.
As shown in Fig. 3B ' Length determined by sequencing of entire intron.
ninth repeat to exon 15 encoding the first portion of the cytoplasmic tail. This indicates that transcripts encoding a putative soluble form of GMP-140 are produced by megakaryocytes as well as endothelial cells. Primers from exons encoding the sixth and eighth repeats were used to amplify the region surrounding the seventh repeat (Fig. 3C). A prominent 543-bp fragment was produced from both platelet and endothelial RNA. A minor fragment of 357 bp was also amplified from endothelial RNA. The size difference of 186 bp between the minor and major fragments was consistent with the predicted deletion of the 186-bp exon encoding the seventh repeat in the smaller product. This is supported by the observation that both the larger and smaller fragments hybridized with an oligonucleotide containing a sequence from the sixth repeat (Fig. 3C)  repeat (not shown). A 357-bp fragment amplified from platelet RNA could not be detected.

DISCUSSION
The observation that most exons in the GMP-140 gene encode distinct structural domains supports the concept that the selectins arose from rearrangement and duplication of preexisting exons. It is noteworthy that all but two of the GMP-140 exons contain symmetrical phase 1 intron-exon boundaries which should facilitate duplication and insertion into other genes. It has been noted previously that exons encoding EGF-like structures and "complement binding" repeats in other proteins contain phase 1 boundaries (23). The carbohydrate-binding domain in Ca*+-dependent lectins comprises approximately 130 residues (reviewed in Ref. 24). Three different exons encode this domain in two members of this family, the rat hepatic lectin (25) and the low affinity Fc receptor for IgE (26). These exons have apparently been fused into single exons encoding equivalent domains in the rat mannose binding protein (27) and the human pulmonary surfactant gene (28). A similar fusion may have produced exon 3 of GMP-140, which encodes the last 10 amino acids of the signal peptide joined to the 118 residues of the lectin domain. The signal-peptide segment shares weak sequence homology with the first 10 residues of lectin domains in other proteins (25,27). It is conceivable that these residues were part of the original lectin domain in GMP-140, but evolved to become part of the signal peptide when they were not required for function in the mature protein. In addition, exon 2, which encodes the majority of the GMP-140 signal peptide, shares similarities in size and sequence with an exon upstream of the carbohydrate binding-domain exons in some Ca2+dependent lectins (25,27). The function of this segment in the other proteins is unknown, but it is not required for carbohydrate recognition (27). Perhaps this exon in GMP-140 originally encoded another function but evolved to form the remainder of the current signal peptide. This might explain the need for an additional upstream exon to encode the ATG codon that initiates translation.
Two variant forms of GMP-140 appear to be generated by alternative splicing of specific exons. In the first, removal of exon 11 encoding the seventh repeat yields a protein containing eight instead of nine repeats. A similar splicing event yields transcripts encoding either 15 or 16 repeats in CR2, a member of the complement regulatory protein family whose members contain repeating units of related structure (29). In the second, deletion of exon 14 encoding the transmembrane domain produces a putative soluble form of GMP-140. Similar alternative splicing generates transcripts that encode soluble forms of two other cell surface receptors, HLA-A2 (30) and the FC-yRII receptor (31).
The functional significance, if any, of the removal of the seventh repeat by alternative splicing is unknown. With one exception, each of the nine exons encoding the repeats in GMP-140 contains 186 bp encoding 62 amino acids. The exception is exon 12 which contains sequence for an additional eight amino acids fused to the rest of the eighth repeat. Perhaps the secondary structure of the precursor mRNA in the region between exons 11 and 12 allows for occasional removal of exon 11. The eight-repeat form of GMP-140 appears to be rare. Only one such cDNA was noted out of six endothelial cDNAs initially examined (1). Furthermore, even with repeated amplifications during the polymerase chain reaction, only a minor band corresponding to this variant was noted in endothelial RNA. No corresponding structure was noted in platelet RNA. However, the amplification was performed with platelet RNA from a single individual, whereas the endothelial RNA was pooled from many donors. Although precise quantitation has not been performed, transcripts encoding the putative soluble form of GMP-140 may be as common as those encoding the membrane form. The deletion of the exon encoding the transmembrane domain was noted in two out of four independent endothelial cDNAs (1). Furthermore, this transcript was easily amplified from both platelet (Fig. 3B) and endothelial cell RNA. This suggests a potentially important role for a secretable form of GMP-140 that is synthesized by both megakaryocytes and endothelium.
A similarly synthesized soluble variant has not been described for the other selectins. Although a soluble form of the homing receptor (Mel 14 antigen, LAM-l) is released from activated leukocytes, it is probably generated by proteolytic cleavage of the membrane-bound molecule (9, 10, 32,33).
The structure of the human gene encoding LAM-l, another selectin, has been recently reported (34). The intron-exon organization of this gene demonstrates a striking similarity to that found in GMP-140, which suggests that a similar pattern will be found in the gene encoding ELAM-1 and perhaps in genes of other selectins yet to be identified.
The genomic structures of LAM-l and GMP-140 support the role of exon duplication and rearrangement in generating this newly described family of proteins that facilitate cellular interactions during inflammation.