Structure of Human Clathrin Light Chains CONSERVATION OF LIGHT CHAIN POLYMORPHISM IN THREE MAMMALIAN SPECIES*

Complementary DNAs (cDNA) encoding the brain and non-brain forms of the human clathrin light chains LC, and LCb have been isolated, sequenced, and compared with their homologues in cow and rat. The significant differences that distinguish LC. from Lcb and the brain from non-brain forms show remarkable preservation in all three species. These features include the position and sequence of the brain-specific inserts, a totally conserved region of 22 residues near the amino terminus, the LCb-specific phosphorylation site, the heavy chain binding site, and a distinctive pattern of cysteine residues near the carboxyl terminus. Unorthodox sequences for translation initiation and polyadenylation are found for Lcb contrasting with LC, which exhibits orthodox regulatory sequences. Small insertions in LC, revealed a duplicated sequence of 13 residues that flank the 22-residue conserved region. Only the carboxyl-terminal copy of this sequence is present in LCb. All sequences are consistent with the heavy chain binding site comprising an a-helical central region of the light chains. The hydro-phobic face of this helix, which is presumed to interact with the heavy chain, is highly conserved between LC, and Lcb, whereas the hydrophilic face shows consid-erable divergence. define the carboxyl-termi-nal limit of the heavy chain binding region, the epitope recognized by the CVC.6 monoclonal antibody was lo-calized to residues 192-208 of LC, with glutamic acid most importance. on a VAX computer and analyzed using the programs of the Univer- sity of Wisconsin Genetics Computer Group (14). Synthetic Peptides and Assay-Peptides were made using F-moc chemistry on a Biosearch 9600 automatic synthesizer. Solid phase radioimmune assays were as described (15) with the following modi- fications. To reduce background 10% normal rabbit serum plus 4 mg/ ml bovine serum albumin were included in the wash buffers and the solutions containing the first step monoclonal antibodies and the second step 1251-F(ab‘)2 fragments of rabbit anti-mouse immunoglob- ulin. 0.02% sodium dodecyl sulfate and 0.1% Triton X-100 were also included in the wash buffers.

Complementary DNAs (cDNA) encoding the brain and non-brain forms of the human clathrin light chains LC, and LCb have been isolated, sequenced, and compared with their homologues in cow and rat. The significant differences that distinguish LC. from Lcb and the brain from non-brain forms show remarkable preservation in all three species. These features include the position and sequence of the brain-specific inserts, a totally conserved region of 22 residues near the amino terminus, the LCb-specific phosphorylation site, the heavy chain binding site, and a distinctive pattern of cysteine residues near the carboxyl terminus. Unorthodox sequences for translation initiation and polyadenylation are found for Lcb contrasting with LC, which exhibits orthodox regulatory sequences.
Small insertions in human LC, revealed a duplicated sequence of 13 residues that flank the 22-residue conserved region. Only the carboxyl-terminal copy of this sequence is present in LCb. All sequences are consistent with the heavy chain binding site comprising an ahelical central region of the light chains. The hydrophobic face of this helix, which is presumed to interact with the heavy chain, is highly conserved between LC, and Lcb, whereas the hydrophilic face shows considerable divergence. To help define the carboxyl-terminal limit of the heavy chain binding region, the epitope recognized by the CVC.6 monoclonal antibody was localized to residues 192-208 of LC, with glutamic acid 198 being of most importance.
The faithful preservation of clathrin light chain polymorphism in three mammalian species provides evidence supporting a functional diversification of the brain and non-brain forms of LC, and Lcb.
In cells of higher vertebrates clathrin-coated pits and vesicles participate in receptor-mediated pathways of endocytosis and intracellular transport of macromolecules. Polymerization of clathrin triskelions on the cytoplasmic face of membranes occurs concomitantly with the clustering of receptors and formation of a coated pit. Invagination then permits formation of a closed vesicle coated with a characteristic clathrin lattice of hexagons and pentagons. On entry of the * This work was supported by Grant CD 234 from the American Cancer Society. The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.
The nucleotide seqwncefs) reported in this paper has been submitted to the GenBankTM/EMBL Data Bank with accession numberfs) 5041 74.
$ Supported by a postdoctoral fellowship from the Muscular Dystrophy Association. vesicle into the cytoplasm the clathrin coat is depolymerized and recycled (1,Z).
Each clathrin triskelion consists of three heavy chains and three light chains. In the cow two genes code for distinct light chains, LC, and Lcb,' that are expressed in all tissues. These genes undergo tissue-specific mRNA splicing in the brain to yield larger forms of LC, and Lcb, containing additional insertion sequences of 30 and 18 amino acids, respectively (3, 4).
The precise role of the clathrin light chains is uncertain. I n vitro evidence that they bind calmodulin and are essential for the activity of an uncoating ATPase points to them being regulatory elements in clathrin function. Neither is the purpose of the light chain polymorphism established. One clue is that Lcb is specifically phosphorylated both in vitro (5) and in uiuo (6) by a casein-kinase 11-like activity.
Clathrin has also been studied in yeast, a single-celled eukaryotic organism (7,8). The role of clathrin in endocytosis, secretion, and intracellular transport has still to be defined but, given the nature of this organism, is unlikely to involve the targeted uptake of a large number of specific proteins. It is therefore of interest that yeast clathrin contains only a single light chain.' This raises the possibility that the evolution of light chain polymorphism is associated with the development in complex multicellular organisms of receptormediated systems for the uptake of multiple macromolecules. The validity of this idea will hinge on the extent to which polymorphic features of the clathrin light chains are preserved in different species. To examine this question we have determined the primary structure of four forms of human clathrin light chains and compared them with light chains from the cow and the rat.

MATERIALS AND METHODS
cDNA Libraries-Human brain-type light chains were isolated from a human retinal cDNA library kindly supplied by Dr. J. Nathans (9). As a source of non-brain light chains, a cDNA library was constructed from the human lymphocyte cell line T 7527 by the method of Gubler and Hoffman (10). Size-selected double-stranded cDNA (0.6 to 4 kb) was ligated into the EcoRI site of the vector hgtl0 and propagated in Escherichia coli strain BNN 102 (11). Both libraries were screened with full-length bovine light chain clones, labeled by the primer extension method (12) to a specific activity of 5 X 10' Seqwncing Strategy-Positive clones were isolated and inserts subcloned in both orientations into M13mp18. Sequences were determined by the dideoxy chain termination method (13). Complete sequences on both strands were acquired by sequencing restriction enzyme subfragments from within the clones and by the use of internally priming oligonucleotides. In many cases oligonucleotide primers originally designed for sequencing the bovine light chains (3) worked satisfactorily for the human clones. Sequences were compiled The abbreviations used are: LC. and LCb, clathrin light chains a L. Silveira and R. Schekman, personal communication.
cpmlrg. and b, respectively. on a VAX computer and analyzed using the programs of the University of Wisconsin Genetics Computer Group (14).
Synthetic Peptides and Assay-Peptides were made using F-moc chemistry on a Biosearch 9600 automatic synthesizer. Solid phase radioimmune assays were as described (15) with the following modifications. To reduce background 10% normal rabbit serum plus 4 mg/ ml bovine serum albumin were included in the wash buffers and the solutions containing the first step monoclonal antibodies and the second step 1251-F(ab')2 fragments of rabbit anti-mouse immunoglobulin. 0.02% sodium dodecyl sulfate and 0.1% Triton X-100 were also included in the wash buffers.

RESULTS AND DISCUSSION
Nucleotide Sequences of cDNA Encoding Human Clathrin Light Chains-Human retinal (9) and lymphocyte cDNA libraries were screened with full-length cDNA probes encoding bovine brain and non-brain forms of the clathrin light chains: LC, and Lcb. Preliminary analysis by Western blotting having shown that retina predominantly expressed the brain forms of the clathrin light chains and that Epstein-Barr virus transformed B lymphoblastoid cell lines expressed the nonbrain forms. Positive clones were obtained from both libraries at a frequency of about 1 in 5 x lo4 for each probe. This permitted the isolation of full-length cDNA clones encoding the brain (retina) and non-brain (lymphocyte) forms of human LC, and Lcb. The complete nucleotide sequences of these clones were determined and were compared with the sequences of cDNA encoding the corresponding light chains from bovine and rat ( Fig. 1, a and b). As no previous comparison of the bovine (3) and rat (4) sequences has been made, we shall make a general assessment of the similarities between the light chains in all three species.
For both LC. and Lcb there is 90-94% identity in the coding region when nucleotide sequences from different species are compared. The human sequences are slightly more similar to those of the cow than the rat ( Table I). The relationships between the brain and non-brain forms of LC, and Lcb are preserved in the three species. Inserted sequences of 90 nucleotides for LC. and 54 nucleotides for Lcb are found in precisely the same positions in the brain forms ( Fig. 1, a  and b). Outside this region, the non-brain forms show no nucleotide substitutions as would be expected if the tissuespecific differences are due to differential mRNA splicing of single genes encoding LC. and Lcb (3,4). It is striking that in the case of LC. there are no nucleotide substitutions in the brain-specific insert, whereas substitutions are found in the flanking regions. For Lcb there are no differences in the brain-specific inserts of cow and human. However, one coding and three silent substitutions are found with rat Lcb (Fig. 2b). In the case of LC, it is clear that there has been selection against both coding and non-coding substitutions in the brain-specific inserts. Some synonymous changes may not be neutral because the concentrations of synonymous tRNA molecules vary widely, and this may affect the rate of protein synthesis or even the way a protein folds. Alternatively, the sequence of the mRNA itself may have to fold into a critical secondary structure for correct tissuespecific splicing. Partial analysis of the bovine LC. gene has revealed that the brain-specific insert is composed of two exons; the first exon encodes residues 158-175 and the second encodes residues 176-187.3 Thus, structural constraints on the LC, gene may be unusually severe in this part of the sequence because of several different splicing patterns that could potentially occur. This is consistent with the observation of Kirchhausen et al. (4) who have described a rat brain H. F. Seow, A. P. Jackson, and P. Parham, unpublished data.
LC, cDNA clone containing only the first 18 amino acids of the brain-specific insert.
In the 5"untranslated regions differences were found between the brain and non-brain forms of human LC, and Lcb ( Fig. 1, a and b). However, we believe these reflect a cloning artifact rather than a difference in the genomic sequences encoding the brain and non-brain forms. The first 9 bases of the cDNA for human brain LC, and the first 46 bases of the cDNA for human brain Lcb are inverted with respect to their non-brain counterparts. The second strand of the retinal cDNA library, which was used to isolate the brain forms, was constructed by the S1 nuclease digestion method, which is known to generate inverted sequences at the 5' end (16). In contrast, the lymphocyte cDNA library was constructed by the RNase H1 technique (lo), which is not prone to this particular artifact. We therefore believe that the correct sequence is represented by the lymphocyte cDNA.
cDNA Encoding LC, and LCb Have Distinct Regulutory Sequences in the Untranslated Regions-The sequences of the 5'-and 3"untranslated regions for LC. and Lcb show littie homology. This pattern is typical of that found in members of a multigene family (17). In addition, the untranslated regions have undergone a more rapid divergence between the three species than has the coding region (Table I). In particular, it was necessary to introduce gaps into each sequence in order to maximize the homology ( Fig. 1, a and b). This shows that a number of insertions and deletions have occurred during the evolution of the light chain genes and suggests that these regions for the most part are under less selective pressure.
Exceptions are the sequences surrounding the initiation codons, which are well conserved between the three species. From comparison of sequences derived from various eukaryotic genes, Kozak (18) has obtained a consensus for this element. In this sequence of CC A/G CC ATGG the positioning of a purine at three bases upstream from the ATG start codon (underlined) is particularly important. For LC. there is an exact match to the consensus sequence. In contrast, the corresponding region of Lcb deviates at 4 of the 6 positions, although the critical purine at position -3 is conserved. Although not unique to Lcb, such extreme divergence from the consensus has only been encountered in about one-tenth of the genes examined (18).
A second and more dramatic example of an unusual regulatory sequence in Lcb is found in the polyadenylation signal in the 3"untranslated region. Again one finds that the LC, gene has an orthodox polyadenylation sequence of AATAAA (19) which starts 19-23 base pairs upstream from the poly(A) tail (Fig. lA,and Ref. 23). For Lcb a conventional polyadenylation signal does not exist. In human and rat Lcb cDNA a related sequence TTAAA occurs 18-30 base pairs upstream of the poly(A) sequence (Fig. lb). A homologous sequence is not found in the bovine Lcb cDNA which does, however, contain the sequence AAAATT about 40 bases from the poly(A) tail. It is possible that these two distinct sequences act as polyadenylation signals, although they do not correspond to any of the known variants (20).

FIG. 1-continwd
An interestingly conserved feature in the 3"noncoding region of L c b is the occurrence of poly(T) sequences adjacent to a G T-rich area (Fig. lb). Several eukaryotic genes contain downstream elements containing these structures, where they function together with the polyadenylation signal to define the mRNA cleavage site. However, these elements do not normally occur in mature mRNA as is found for L c b (19, 20,22,23). Another curious feature of the 3'-untranslated region of L c b is the set of two inverted repeats that may be capable of forming a hairpin loop structure (Fig. lb). Since similar sequences are found in all three species, it is possible that they may play a role in the transcriptional control of the L c b gene.
Comparison of Structure of Clathrin Light Chains-Similarities between the protein sequences of clathrin light chains from different species are high, giving a range of 95-98% identity between any pair of LC, or Lcb sequences (Table 11). The substitutions are generally conservative in nature. This contrasts with the range of 59-60% identity which is obtained when LC. and Lcb sequences are compared. Thus, the differences between LC. and Lcb, as well as their occurrence, have been conserved during the diversification of three orders of mammals. It is therefore likely that these differences reflect a functional diversification of the LC. and Lcb light chains.
A striking pattern of differences between LC. and Lcb is seen in the first 24 residues of the amino-terminus (Fig. 3). This region of Lcb is rich in serines and is the site for Lcbspecific phosphorylation, a functional property that distinguishes the two classes of clathrin light chain (24). This region of LC, contains no serines and its length differs between species, an extra five amino acids being found in the human and rat compared with the cow. The extra bases found in the human and rat represent simple repeats of adjacent sequences, a pattern of species variation that has been noted in other genes (25) . Efstratiadis et al. (17) have proposed a mechanism in which tandem repeats can be lost during DNA replication.
The LC, sequences can be readily interpreted in terms of this model, which would imply that the two extra blocks of sequences were present in the common ancestor of rat, cow, and human but were subsequently lost along the lineage leading to the cow.

100
The presence of these extra residues combined with the substitution of alanine at position 9 in human LC. permitted identification of a similarity between residues 7-16 of LC, and 47-59 of Lcb. Dot matrix analysis confirmed the significance of this homology in the nucleotide sequences (Fig. 4a). Thus, 29 of 39 base pairs are shared by nucleotides 94-132 of LC, and nucleotides 306-344 of Lcb. This is greater than the 20 of 39 base pairs that are shared by positions 306-344 of LCb and positions 220-258 of LC,, which are the homologous regions when alignment is continued from the more conserved 3' end of the coding region (Fig. 4b). This repeat motif in the LC, sequence is most likely the result of a duplication event during the evolution of the LC, gene. Ongoing analysis of the exon-intron organization of the LC, gene may produce evidence in support of this hypothesis.
At present the functional consequences of the duplicated 7 duplicated sequence sequences can only be guessed. However, two features may be of significance; first, the duplicated sequences of LC. flank the most conserved region (residues 23-44) of the clathrin light chains. Second is the distinction that the single Lcb copy of this sequence is on the carboxyl-terminal side of the conserved region, whereas the more closely related LC. sequence is on the amino-terminal side (Fig. 3). This shuffling of sequences in the primary structure may change the juxtaposition of functionally important sites in LC, and Lcb.
Heavy C h i n Binding Region-All clathrin light chains share the property of binding to the clathrin heavy chain (26). Association assays in uitro shows LC, and Lcb are equivalent competitors for the heavy chain, indicating that they have similar heavy chain binding regions (27). A variety of experimental and theoretical analyses point to the central region, residues 95 to 157 of the light chains, as being involved in heavy chain binding (Fig. 3). Monoclonal antibodies against this region are against cryptic epitopes of clathrin triskelions and inhibit in uitro association (15). This structure is predicted to be a-helical, and sequence homology is found with the a-helical domains of several intermediate filament proteins (3). In addition Kirchhausen et al. (4) have emphasized a set of heptad repeats found within this same region and suggest that the light chains form a coiled coil around the proximal arm of the heavy chain. In this region human LC, differs by 2 residues from the cow and 3 residues from the rat, whereas human Lcb differs by only a single residue from the cow. These small numbers of conservative substitutions do not affect the theoretical predictions. In contrast there are 22 differences between LC, and LCb in the 63 residues of this region, including 13 non-conservative substitutions (Fig. 3). Apart from three conservative changes in the heptad repeats (at positions 95,126, and 131), most of the differences between light chains cluster on the opposite face of the helix from that predicted to interact with the clathrin light chain (Fig. 5). Thus, LC, and Lcb may bind to the same site on the heavy chain via the conserved heptad repeats and yet present a markedly different structure along the side of the helix that is potentially exposed to the cytosol.
The precise boundaries of the heavy chain binding site are uncertain. However, at the amino-terminal side, residues 45-94 contain 7 out of 17 prolines in LC, and 5 out of the 13 prolines in Lcb (Fig. 3). This region has a low probability of forming an a-helix. Moreover, this same region contain 6 out of the 12 species differences for LC, and 11 of the 14 species differences for LCb. This implies that residues 45-94 are under less functional constraint than other areas of the molecule and are therefore less likely to contribute to heavy chain binding. To define the carboxyl-terminal limits of the heavy chain binding site, the identification of residues contributing to the epitope recognized by monoclonal antibody CVC.6 was undertaken.
Glutamic Acid 198 Is Critical for the CVC.6 Epitope-We have previously shown that CVC.6, an LC,-specific monoclonal antibody, binds to a tryptic peptide obtained from bovine LC, consisting of residues 188-208 (15). Within this sequence human, bovine, and rat LC. only differ at position 198. which is glutamic acid in the cow and asDartic acid in LC., showing no detectable reaction with human LC., this suggests that a single methylene group in the residue at position 198 is absolutely critical for the CVC.6 epitope.
A synthetic peptide corresponding to residues 188-208 of bovine LC. was found to bind strongly to the CVC.6 antibody in a solid phase radioimmunoassay. This showed that the epitope was purely the result of primary sequence and not to a post-translational modification present in the previously studied "natural" peptide. The homologous human LC. peptide, differing by just the substitution of aspartic acid for glutamic acid at position 198, showed no binding over background levels as determined with an irrelevant peptide. To Drovide a more sensitive and auantitative comDarison of the human and rat (Fig. 3). As CVC.6 is highly specific for bovine .~ ~ ~ ~ ~~~ ~ ~ carboxyl terminus is almost perfectly conserved between species. However, there are 14 substitutions out of 51 residues between LC, and Lcb, including distinct patterns of cysteine residues which in vitro can readily form disulfide bonds (28).
This comparison suggests that this carboxyl-terminal region may, in addition to the amino-terminal region, provide a site of functional diversification for clathrin light chains LC. and Lcb.
FIG. 6. Inhibition of CVC.6 binding by synthetic peptides. The binding of CVC.6 to a solid phase peptide corresponding to residues 188-208 of bovine LC, in the presence of fluid phase peptides was measured. Inhibitory peptides were bovine LC. residues 188-208 peptide-antibody interaction an assay in which the capacity of soluble peptides to inhibit the binding of CVC.6 to a solid phase was measured. As shown in Fig. 6 a strong and titratable inhibition was seen with the bovine peptide. In contrast the human peptide gave no inhibition even at a concentration 50fold greater than was required to see an effect with the bovine peptide.
To define further the CVC.6 epitope a nested set of peptides derived from the bovine LC, sequence was made. These had a common-carboxyl terminus at residue 208 and stepwise amino-termini from residues 188 to 199. Strong inhibition was seen with peptides having amino-terminal residues 188-192, significant but weak inhibition was seen with peptides having amino-terminal residues 193-194, and no inhibition was seen with the shorter peptides (Fig. 6).
These results prove that the glutamic acid residue at position 198 of bovine LC. is absolutely critical for the CVC.6 epitope which is formed by residues 192-208 of bovine LC.. Although flanking residues are clearly important in forming the epitope, they confer no significant interaction with CVC.6 in the absence of glutamic acid at position 198.
CVC.6 binds equally well to free light chains and to light chains that are associated with heavy chains in triskelions and coated vesicles (15). This indicates that interaction of this antibody with its epitope is not perturbed by heavy chain binding. As the linear CVC.6 epitope minimally consists of residues 192-198, it is highly unlikely that this sequence is involved in heavy chain binding. These results strongly suggest that the heavy chain binding site does not extend beyond residue 192, consistent with the organization of the heptad repeats which end at residue 193 (Fig. 3).
The region from the end of the heptad repeats to the