EVOLUTION OF THE CLTCL1 GENE ENCODING CHC22 CLATHRIN REVEALS SELECTION INFLUENCING CHC22’S ROLE IN HUMAN GLUCOSE METABOLISM

CHC22 clathrin plays a key role in intracellular membrane trafficking of the insulin-responsive GLUT4 glucose transporter, and so in post-prandial clearance of glucose from human blood. We performed population genetic and phylogenetic analyses of the CLTCL1 gene, encoding CHC22, to understand its variable presence in vertebrates and gain insight into its functional evolution. Analysis of ~50 complete vertebrate genomes showed independent loss of CLTCL1 in nine lineages after it arose from a gene duplication during the emergence of jawed vertebrates. All vertebrates considered here retained the parent CLTC gene encoding CHC17 clathrin, which mediates endocytosis and other housekeeping membrane traffic pathways, as performed by the single clathrin gene product in non-vertebrate eukaryotes. Statistical analysis provides evidence of strong purifying selection over phylogenetic timescales for CLTCL1, as well as for CLTC , supporting preserved functionality of CHC22 in those species that have retained CLTCL1 . In population genetic analyses of humans and chimpanzees, extensive allelic diversity was observed for CLTCL1 compared to CLTC . In all human populations, two variants of CLTCL1 segregate at high frequency, resulting in CHC22 protein with either methionine or valine at position 1316. The V1316 variant occurs only in humans, but the same site is polymorphic in non-human primates as well. Analysis of archaic and ancient humans assigned the appearance of the derived V1316 allele to 500-50 KYA. Balancing selection on the two high-frequency CHC22 variants is inferred, with V1316 being more frequent in farming, as compared to hunter-gatherer populations. Together these analyses suggest that CHC22 clathrin is undergoing selection in humans related to its role in nutrient metabolism. Consistent with this conclusion, we observed functional differences between the two CHC22 variants in their ability to control GLUT4 membrane traffic, as predicted by structural modeling and differences in cellular dynamics of the two variants. insulin-responsive GLUT4 pathway for clearing blood glucose. To understand the evolution of the specialized function of CHC22 and the potential selective processes involved, we here explore the phylogenetic


INTRODUCTION
Clathrin-coated vesicles (CCVs) are key players in eukaryotic intracellular membrane trafficking (1). Their characteristic lattice-like coat is self-assembled from cytoplasmic clathrin proteins, captures membrane-embedded protein cargo and deforms the membrane into a vesicle. This process enables CCVs to mediate protein transport to and from the plasma membrane and between organelles. The triskelion-shaped clathrin molecule is formed from three identical clathrin heavy chain (CHC) subunits. Humans have two genes (CLTC and CLTCL1) that respectively encode CHC17 and CHC22 clathrins (2). CHC17 clathrin, which has three bound clathrin light chain (CLC) subunits, is expressed uniformly in all tissues and forms CCVs that control receptor-mediated endocytosis, as well as lysosome biogenesis and maturation of regulated secretory granules. These pathways are conventionally associated with clathrin function and are mediated by clathrin in all eukaryotic cells (1). CHC22 clathrin is most highly expressed in human muscle and adipose tissue and forms separate CCVs that are not involved in endocytosis (3). Instead, CHC22 CCVs regulate targeting of the glucose transporter 4 (GLUT4) to an intracellular compartment where it is sequestered until released to the cell surface in response to insulin (4). This insulin-responsive GLUT4 pathway is the dominant mechanism for clearing blood glucose into muscle and fat after a meal (5). In addition to their distinct tissue expression patterns and biological functions, the two clathrins segregate in cells and CHC22 does not bind the CLC subunits that associate with CHC17 clathrin, even though the CHC protein sequences are 85% identical (3,6). This remarkable biochemical and functional divergence has evolved since the occurrence of a gene duplication that gave rise to the two different clathrins, 510-600 MYA (2) during the emergence of chordates. Notably, however, the CLTCL1 gene encoding CHC22 has evolved into a pseudogene in the Mus genus, despite mice have an insulin-responsive GLUT4 pathway for clearing blood glucose. To understand the evolution of the specialized function of CHC22 and the potential selective processes involved, we here explore the phylogenetic . CC-BY-NC 4.0 International license is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/307264 doi: bioRxiv preprint 5 history of the CLTCL1 gene in vertebrates and its population genetics in humans, nonhuman primates and bears.
Alterations in ecology create selective forces that filter variation in cellular genes. These include changes in nutritional conditions (7), as well as encounters with pathogens (8); both documented as selective forces that affect membrane traffic genes (9,10). Recent studies of the evolution of genes involved in membrane trafficking have focused on an overview of all eukaryotes with the goals of establishing the origins of membrane-traffic regulating proteins in the last common eukaryotic ancestor and defining the species distribution of various families of traffic-regulating proteins (11,12). These studies have identified common features of proteins that regulate membrane traffic (11) and also revealed that extensive gene duplication has allowed lineage-specific diversification of coat proteins and other membrane traffic regulators, such as the Rab GTPases (13,14). Our earlier study of available annotated genomes in 2005 suggested that the gene duplication giving to rise to the two CHCencoding genes occurred as a result of one of the whole genome duplications contributing to chordate evolution (2). Here we focuses on the more recent evolutionary history of these genes, taking advantage of the extensive increase in the number of annotated genomes in the last thirteen years. We establish that the Mus genus is not unique in its post-chordate loss of CLTCL1, identifying nine independent gene loss events in vertebrates that affected twelve studied species. Nonetheless, there is strong evidence for CHC22 sequence conservation amongst those species that retained CLTCL1 (2). This evolutionarily recent gene loss in some lineages and retention of the functional form in others suggested that CLTCL1 may still be under purifying selection, so we also examined CLTCL1 variation in populations of vertebrates with annotated genome sequences from multiple individuals including humans and non-human primates. In all primates, we found CLTCL1 to be considerably more variable than CLTC, which encodes the clathrin found in all eukaryotes.
Comparative analysis of the two genes demonstrated strong ancient purifying selection for CHC17 clathrin function and diversifying (or relaxed purifying) selection on CHC22 function. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/307264 doi: bioRxiv preprint 6 Additionally, we identified two main genetic variants of human CHC22, which have different functional properties, indicating species-specific variation in CHC22 function. We previously observed that CHC22 accumulates at sites of GLUT4 retention in the muscle of insulinresistant patients with type 2 diabetes (4). Thus, CHC22 variation has potential significance for membrane traffic pathways involved in human glucose metabolism. The studies reported here lead us to propose that variation in the CHC22 clathrin coat may be a response to changing nutritional pressures both between and within vertebrate species.

Phylogenetic analyses reveal complete loss or retention of a functional CLTCL1 in different species
Identification of CHC-encoding genes in >50 vertebrate and non-vertebrate species (Fig. S1) revealed a dynamic gene family with gene duplications and losses (Fig. 1) common marmoset (Callithrix jacchus). This updated genetic analysis further indicates that our apparent detection of CHC22 in rat cells was not correct, and likely due to antibody cross-reactivity with other rat proteins, not appreciated at the time (17).
CLTC and CLCTCL1 are located on paralogous regions of human chromosomes 17 and 22, respectively. Several adjacent paralogues have been maintained in these chromosomal regions, some of which are involved in membrane traffic, including the gene pair of MTMR4 and MTMR3, encoding the myotubularin lipid phosphatases. In addition, the CLC subunits of CHC17 clathrin are encoded by paralogous genes (CLTA and CLTB) that arose from a local gene duplication, mapped to the same time frame as the CHC-encoding duplication (2). To examine whether the evolution of CLTC and CLTCL1 is typical for genes with related functions that duplicated in the same time frame, the evolutionary rates (dN/dS, rate of nonsynonymous mutations to rate of synonymous mutations) across vertebrates of each position of the three pairs of paralogous genes were determined and plotted along the length of the protein sequences ( Fig. 2A, B). Comparison of the distribution of dN/dS ratios for the three pairs revealed stronger purifying selection on the CLTC/CLTCL1 genes than on the other two pairs (Fig. 2C-E), suggesting the entire clade is evolutionary constrained. This observation is consistent with our previous identification of conserved signature residues in CLTCL1 using DIVERGE analysis (2) and indicates conserved functions for both the CLTC and CLTCL1 gene products. Furthermore, there was a striking difference in the distribution and average of evolutionary rates between CLTC and CLTCL1 (Kolmogorov-Smirnov test p-. CC-BY-NC 4.0 International license is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/307264 doi: bioRxiv preprint 8 value < 2.2e-16), with CLTC being significantly more constrained by purifying selection than CLTCL1. In contrast, there was minimal difference in the distribution and average of evolutionary rates between the two paralogs in MTMR4/MTMR3 and CLTA/CLTB (Kolmogorov-Smirnov test yields p-values 0.003643 and 0.9959 respectively).

Human population genetics analyses indicate purifying selection with ongoing diversification for CLTCL1
To follow up the indication that CLTC and CLTCL1 are subject to different degrees of purifying selection during evolution, we investigated their variation in human populations. We analyzed 2,504 genomes from the 1000 Genomes Project database, phase 3 (18) and identified alleles resulting from nonsynonymous substitutions for CLTC and CLTCL1. This dataset included individuals from each of five human meta-populations (European (EUR, 503), East Asian (EAS, 504), Admixed American (AMR, 347), South Asian (SAS, 489) and African (AFR, 661)). The reference sequences for chimpanzee (Pan troglodytes) and pseudo-references for two archaic humans, Altai Neanderthal and Denisovan were also included to relate allelic variation to the ancestral state. A median-joining network for all the inferred CLTC human alleles showed a very common allele (sample frequency = 0.997) with only five low-frequency variants generating a total of six alleles (Fig. 3A). In this network, the allele frequency is indicated by the size of the circle and lines between circles indicate the number of non-synonymous changes between alleles. The meta-populations in which the allele is found are indicated in color representing their percentage of the total frequency of the allele in humans.
In contrast to CLTC, we identified 46 non-synonymous SNPs in CLTCL1, present in 52 distinct alleles (Tables S1-S2). A median-joining network for the 20 most common CLTCL1 alleles showed that they are widely distributed within the human macro-populations (Fig.   3B). Each macro-population tends to have private, less frequent alleles. Nevertheless, all the macro-populations comprised two main allelic clades, together constituting a sample . CC-BY-NC 4.0 International license is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/307264 doi: bioRxiv preprint frequency of 77%. These alleles differ by a single M1316V substitution (SNP ID rs1061325 with genomic location chr22:19184095 on hg19 assembly). The valine at position 1316 is predicted by PolyPhen (19) to have a functional effect on the protein and was categorized as "probably damaging" with a probability of 0.975. The ancestral sequences from chimpanzee and ancient humans clustered within the M1316 clade, suggesting that M1316 is likely to represent the ancestral state. To further investigate this, we inspected raw sequencing data for both Altai Neanderthal and Denisovan (Table S3). We inferred the most likely genotype to be homozygous for the M1316 amino acid (minimum depths equal to 40 and 28, respectively). We further extracted sequencing data for 13 archaic and ancient humans (Table S3). We found that the V1316 amino acid is present in Pleistocene hunter-gatherers  The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/307264 doi: bioRxiv preprint To assess whether such summary statistics are expected or not under neutral evolution of each population, a set of control genes with the same coding length as CLTCL1 were similarly processed (Table S5). Since we analyzed only non-synonymous substitutions, which are likely to behave differently than silent mutations, standard coalescent simulations are not suitable for deriving the expected distribution of genetic diversity. By analyzing more than 500 control genes, we computed the empirical percentile rank for each statistic for each population. High and low values (e.g. > 0.95 and < 0.05) are indicative of statistics for CLTCL1 being outlier in the empirical null-distribution with properties unlikely to occur by random genetic drift. Summary statistics and populations were then clustered according to their empirical ranks and plotted on a heat map (Fig. 4). All populations exhibited high empirical ranks for S, with CLTCL1 being an outlier in the distribution for many populations (percentile rank > 0.95). Similarly, most of them, apart from CHS (East Asians), showed a high ranking for CLTCL1 SS (the number of singletons), although SS is very sensitive to sample size. Nevertheless, in all populations CLTCL1 showed high values for H2 and H2H1, both suggesting an unusually high frequency for the second most common allele, a situation likely to occur under balancing selection (20) or as a result of a soft sweep (21). That CLTCL1 was low ranking in all populations for H1, a statistic depicting the frequency of the most common allele, also supported diversifying selection rather than hard sweeps.
Similarly, TD, FLDs, and FLFs values for CLTCL1 were generally lower than observed for the 500 test genes, although this was not consistent across all populations. Inconsistency may be partly explained by these statistics not having the power to capture complex selective scenarios. Nevertheless, the distribution of empirical percentile ranks for all analytical parameters of CLTCL1 variation (Fig. 4)  We did not find any suggestion that F ST values for CLTCL1 are outliers in the empirical distribution of control genes, with percentiles ranks between 0.15 and 0.37. This is compatible with selection acting on distinct populations to different extents.
The high frequency of two major alleles of CLTCL1 that appear in all modern human populations (Fig. 3B, Table S1) is a potential indicator of balancing selection (20). Such a distribution was confirmed using a different data set of more than 50 sampled human populations (Fig. S2). In several populations, we also observed an apparent excess of heterozygosity for the M1316V variant (Table S6) (Table S7). A selective pressure that might be acting on CLTCL1, irrespective of population distribution could be changes in dietary habits that have been occurring globally since the advent of crop cultivation. Farming, followed by food manufacture has gradually, then catastrophically, increased carbohydrate availability and consumption by humans. As CHC22 clathrin, the gene product of CLTCL1, is required for formation of the intracellular pathway critical for an insulin response, its genetic history could potentially be influenced by these changes. To address the hypothesis that nutritional habits conferred selective pressure on CLTCL1, we compared frequency of the major SNP (rs1061325) in farming vs hunter-gatherer population samples for ancient and modern humans. Although the appearance of SNP rs1061325 predates the advent of farming (Table S3), the observed frequencies support potential selection once farming became common practice for a population (Fig. 6).

Genetic variation in non-human vertebrate populations supports functional diversification of CLTCL1
. CC-BY-NC 4.0 International license is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/307264 doi: bioRxiv preprint We analyzed allelic variation for CLTC and CLTCL1 in the genomes of 79 individuals representing six species of great ape, two species each for chimpanzee, gorilla and orangutan (Pan troglodytes, Pan paniscus, Gorilla beringei, Gorilla gorilla, Pongo abellii, Pongo pygmaeus). After data filtering and haplotype phasing, we found no non-synonymous SNPs for CLTC and 64 putative non-synonymous SNPs for CLTCL1 (Table S8). In gorillas, orangutans and chimpanzees one of the non-synonymous changes in CLTCL1 leads to a premature stop-codon at amino position 41, with an overall frequency of 36%. However, sequences containing the stop-codon exhibited only a marginal increase of nucleotide diversity (+4.7% as measured by Watterson's index (22)) compared to the full-length sequences, suggesting that these are relatively new variants. Notably CLTCL1 for all the non-human primates analyzed do not have the 1316V amino acid replacement, which appears private to humans. It is interesting however, that in gorillas, orangutans and chimpanzees we observe a common but different substitution at the same amino acid position to T1316.
To further investigate this point, we increased the sample size per species by analyzing CLTCL1 variation in 70 genomes of chimpanzees and bonobos. A median-joining network for the inferred 8 CLTCL1 alleles (Table S9) showed a major allele common to different species with less frequent alleles primarily restricted to individual ones (Fig. 7). We observed considerable diversity in non-human primates, with a potential tendency towards multiple variants. However, amino acid 1316 was not covered in this data set due to poor sequence data mappability, probably associated with the high nucleotide diversity observed. We are therefore unable to establish the frequency of T1316 amino acid polymorphism, nor was it possible to establish a pattern of co-dominance within a species, as seen for the human population.
We further investigated CLTCL1 variation in polar bears (Ursus maritimus) and their closest related species brown bears (Ursus arctos). These two species, which diverged 479-343 kya . CC-BY-NC 4.0 International license is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It This sample of bear populations supports the emergence of multiple CLTCL1 variants within a species and potential diet-related selection.

Modelling CHC22 variation based on CLTCL1 polymorphism suggests an effect on clathrin lattice contacts
One explanation for selection of the human-specific CLTCL1 allele encoding V1316 is that this amino acid change might confer a functional change in the clathrin lattice. This was predicted by the PolyPhen analysis, highlighting the change as potentially structure-altering.
Most humans are heterozygous for the M1316/V1316 allotypes, and there may potentially be special properties for mixed lattices formed from the two gene products. To address the possibility that this polymorphism affects protein function, we used MODELLER (24) to produce a homology model of the two CHC22 allotypes based on the crystal structure of CHC17 clathrin (PDB 1B89), taking advantage of the 85% protein sequence identity between human CHC17 and CHC22 (Fig. 8). The two homology models were positioned in the cryo-electron microscopy map of the bovine clathrin lattice (EMD: 5119) by structural superposition on the atomic model (25) originally fitted in the map (PDB:1XI4). Modeling analysis using UCSF Chimera (26), showed that residue 1316 is found at a key interface between triskelion legs in assembled clathrin ( Fig. 8A and top of panel B). If M1316 is substituted by V1316, the smaller side chain creates a void that would be energetically unfavorable (Fig. 8, bottom of panel B), such that the triskelion leg might twist slightly to close the void. In the clathrin lattice, the legs have a torque that rotates the assembly . CC-BY-NC 4.0 International license is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/307264 doi: bioRxiv preprint interface along the protein sequence (27), so a further twist could slightly adjust the interface, altering assembly interactions. Changes in the assembly interface could affect integrity of the lattice and potentially influence disassembly. Mixed lattices of the two CHC22 allotypes would therefore have different properties from CHC22 coats formed in homozygotes for the two major variants. CHC22 is needed for the traffic of GLUT4 to its intracellular storage compartment, where GLUT4 awaits release to the plasma membrane in response to insulin. But, CHC22 also accumulates at the GLUT4 storage compartment when it expands due to impaired GLUT4 release in cases of insulin-resistant type 2 diabetes (4).
Thus, genetic variation of CHC22 could alter rates of retention and release of GLUT4 in both health and disease.

CHC22 variants display functional differences
To test whether the evolutionary change from M1316 to V1316 in CHC22 clathrin alters its properties, three aspects of CHC22 biochemistry and function were compared for the two allotypes. HeLa cells were transfected with constructs encoding each CHC22 variant tagged with green fluorescent protein (GFP). Atypically for their epithelial cell origin but not for transformed cells, HeLa cells express CHC22 clathrin (they are homozygous for the M1316 variant) (28,29). We observed that the transfected fluorescently tagged CHC22 variants both localized similarly to endogenous CHC22-Met1316 (30). The dynamics of membrane association for the two allotypes was then tested. This assay measures turnover of clathrin in vesicle coats using Fluorescence Recovery After Photobleaching (FRAP), which would be an indication of coat stability. In these assays, cells expressing fluorescent proteins were photobleached in a defined area (Fig. 9A) and the rate of recovery of fluorescence was measured for each CHC22 allotype and for CHC17 clathrin (encoded by CLTC). Recovery of CHC17 fluorescence was the most rapid, consistent with its more soluble properties compared to CHC22 (3). CHC22-M1316 showed the slowest recovery and CHC22-V1316 was intermediate (Fig. 9B), suggesting that it forms a more readily exchangeable coat.
. CC-BY-NC 4.0 International license is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/307264 doi: bioRxiv preprint 15 The function of the CHC22 variants in GLUT4 retention was then assessed. Because HeLa cells express CHC22, they can form a GLUT4 storage compartment (GSC), when transfected to express GLUT4. These cells can sequester GLUT4 intracellularly, and then release it to the plasma membrane in response to insulin, behaving like muscle and adipocytes, though with more modest response (30)(31)(32). Transfection of HeLa cells with siRNA depleting CHC22 ablates this insulin-responsive pathway (30). To detect GLUT4 release to the cell surface, we used a construct expressing GLUT4 tagged with mCherry and a hemagglutinin (HA) epitope embedded in an exofacial loop of the transporter (HA-GLUT4-mCherry). Appearance of surface GLUT4 in response to insulin was detected by fluorescence-activated cell sorting (FACS) using an antibody to the HA epitope (Fig. 9D). We then assessed whether siRNA inhibition of insulin-responsive GLUT4 release can be rescued by expression of CHC22-M1316-GFP or CHC22-V1316-GFP. These are the same constructs characterized in Fig. 9A and they are siRNA-resistant, as well as being GFPtagged. CHC17 transfection was performed for comparison. It was possible to analyze cells with equal levels of expression of the rescue constructs, as measured by GFP expression.
Cells from each transfection were divided into thirds expressing equivalently low, medium and high levels of transfected CHC for analysis. We observed higher levels of total GLUT4 content as measured by mCherry fluorescence in cells expressing CHC22-M1316-GFP, compared to cells expressing either CHC22-V1316-GFP or CHC17-GFP at both medium and high levels of CHC expression (Fig. 9C). This suggests that GLUT4 is more sequestered from degradative membrane traffic pathways when packaged into the GSC by CHC22-M1316 than by CHC22-Val1316, suggesting the M1316 variant is more efficient at GLUT4 transport. We observed that, when expressed at the same levels in cells where endogenous CHC22 was silenced, CHC22-M1316 was able to restore the insulin response but CHC22-V1316 was not (Fig. 9D). We know, however, that CHC22-V1316 is functional for GLUT4 sequestration because CHC22-transgenic mice that express CHC22-V1316 in their muscle from the natural human promoter, show GLUT4 sequestration in their muscle that exceeds sequestration in wild-type mice without CHC22, leading to higher blood glucose . CC-BY-NC 4.0 International license is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/307264 doi: bioRxiv preprint in the transgenic animals (4). We conclude that HeLa cells are only just able to form a functional GSC from which GLUT4 can be released, because they express the M1316 variant, which appears to be most active for GLUT4 sequestration (Fig. 9C). For these cells, CHC22-V1316 is inadequate to restore GSC formation when CHC22-M1316 is depleted. Using this HeLa model is necessitated by the lack of natural models for the CHC22dependent GLUT4 pathway in myoblasts and adipocytes, as well as a lack of antibodies that detect surface GLUT4. Nonetheless, these experiments demonstrate a functional difference between CHC22-M1316 and CHC22-V1316 and suggest that CHC22-V1316 is less efficient at GLUT4 sequestration.

DISCUSSION
We analyzed the phylogenetics and population genetics of CHC22 clathrin to understand the functional variation of this protein in relation to its evolutionary history. CHC22 clathrin is a key player in post-prandial clearance of glucose from human blood through its role in intracellular packaging of the GLUT4 glucose transporter in muscle and fat, the tissues in which CHC22 and GLUT4 are expressed (4). The CHC22 pathway positions GLUT4 for cell surface release in response to insulin and consequent uptake of glucose into these tissues (33). The gene encoding CHC22 resulted from a gene duplication that we have now mapped to 494-451 MYA, early in vertebrate evolution when jawed vertebrate emerged. We had previously shown that the gene encoding CHC22 (CLTCL1) is a pseudogene in mice (2).
Expanding analysis to ~50 complete vertebrate genomes (>5X coverage) we observe that CLTCL1 is absent from twelve vertebrate genomes, and infer independent loss in at least nine clades. All vertebrate and non-vertebrate eukaryotes considered here have retained the parent CLTC gene encoding CHC17 clathrin, which mediates endocytosis and other housekeeping membrane traffic pathways (12). Our new analysis establishes that CLTC is under strong purifying selection. Notably, CLTCL1 also shows evidence of purifying selection in the species in which it has been retained, supporting the functionality of CHC22 in those . CC-BY-NC 4.0 International license is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/307264 doi: bioRxiv preprint species. Compared to CLTC, extensive allelic diversity was observed for CLTCL1 in all species where populations were analyzed, including humans, chimpanzees, and bears.
Variant alleles were species-specific in most cases and unique variants were found in species with extreme dietary habits, such as polar bears. In all human populations, two variants of CLTCL1 are present in high frequency, differing only at one nucleotide, resulting in CHC22 protein with either methionine or valine at position 1316. The V1316 variant occurs only in humans, but some non-human primates have a different variant at the same position.
Analysis of ancient humans mapped the appearance of the V1316 variant to 500-50 KYA and indicated that M1316, which is present also in CHC17 clathrin, is the ancestral state.
Statistical analysis of human populations provided evidence for balancing selection of the two variants of CHC22 and suggested that this may have become more pronounced in farming, as compared to hunter-gatherer populations. Together these analyses suggest that CHC22 clathrin is undergoing selection in humans and we hypothesize that selective pressure results from its role in nutrient metabolism. Consistent with this hypothesis, we observed functional differences between the two CHC22 variants in their ability to control GLUT4 membrane traffic, as predicted by structural modeling and differences in cellular function of the two variants.
Our phylogenetic analysis reveals that after the ancestral gene encoding CHC duplicated in vertebrates, the CHC functions encoded by CLTC in other eukaryotes were retained and maintained in all vertebrates. The CHC22 paralogue encoded by CLTCL1 has been independently lost in various vertebrate lineages, though purifying selection was evident in the species that retained it. The cellular pathway in which CHC22 clathrin functions, insulinresponsive glucose uptake mediated by membrane traffic of GLUT4, is still present in the species lacking CLTCL1 and CHC22. Our study of the pathway mediated by CHC22 indicates that CHC22 enhances membrane traffic in an existing transport route for GLUT4 from the secretory pathway such that this route becomes more efficient for delivering GLUT4 to its insulin-responsive storage compartment and complements the established endocytic . CC-BY-NC 4.0 International license is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/307264 doi: bioRxiv preprint pathway for GLUT4 targeting (30). Thus, we suggest that species with functional CHC22 clathrin are better at intracellular GLUT4 sequestration, resulting in lower surface GLUT4 in the absence of insulin, and tighter regulation of GLUT4 release in response to insulin.
Vertebrate species without CHC22 rely mainly on the endocytic membrane traffic route, as defined in murine cells lacking CHC22, so these species are less efficient at GLUT4 sequestration. The presence of CHC22 in humans would have been favorable for the development of our large brain, preserving blood glucose levels for brain function during starvation (34). However, for species that are incessant herbivores or naturally herbivorous, with their need for constant low-level clearance of blood glucose, CHC22 would not support their life-style. Eight of the twelve vertebrates that have lost CLTCL1 fall into this category including mouse, rat, horse, sheep, cow, elephant, the common marmoset, mainly a sapeating herbivore (35) and the pig, an omnivore with a preference for plant matter (36). Of the remaining six species without CLTCL1, four are insect and shellfish eaters (bat, platypus, cave fish, elephant shark), effectively carnivores, who do not need an insulin-responsive glucose clearance pathway. If CLTCL1 started accumulating mutations during the evolution of any of these 12 species, there would have been no selective pressure to retain it, given their dietary habits. The cave fish, which lacks CLTCL1, is a notable example here because these animals have the capacity to respond to insulin but have independently evolved mutations in their insulin receptor, rendering them naturally insulin resistant, consistent with this pathway being a target for natural selection driven by diet (37). However, CHC22 can be tolerated in species that do not need it, as evidenced by CHC22-transgenic mice (4), which are viable but exhibit excessive GLUT4 sequestration and age-related high blood glucose.
So, loss of CLTCL1 in a species that does not need it may have occurred only if the gene began to deteriorate, and other pathways were sufficient to compensate over time. Thus, we can consider CLTCL1 as a gene under ongoing selection during vertebrate evolution.
The allelic variation reported here for CLTCL1 in human populations supports this notion.
While purifying selection appears to be operating on CLTCL1 in the species that have . CC-BY-NC 4.0 International license is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/307264 doi: bioRxiv preprint retained it, CLTCL1 is still far more variable than CLTC in the same species. In humans, we find statistical support for balancing selection establishing two major and functionally distinct alleles at remarkably similar frequencies in all population samples studied. The SNP distinguishing these alleles is human-specific and apparently arose 550-50 KYA (post-Neanderthal, pre-Neolithic). Statistical analysis comparing modern farming populations with modern hunter-gatherers shows an apparent increase of the V1316 variant, suggesting a correlation with regular consumption of farmed carbohydrate. The fact that the two alleles are functionally distinct indicates that diversifying selection is also operating on CLTCL1.
Clathrins are self-assembling proteins and function as a latticed network in the protein coat that they form on transport vesicles. Our structural modelling predicts that the SNP-encoded change in CHC22 of the two main alleles could influence the strength of molecular interactions in the CHC22 clathrin lattice, as position 1316 occurs at a lattice assembly interface (Fig. 8). When expressed in cells, both CHC22 variants had the same overall intracellular distribution, but CHC22-V1316 showed more rapid turnover from membranes than CHC22-M1316 and was less effective at GLUT4 sequestration (Fig. 8). These properties are consistent with the Met to Val change attenuating GLUT4 retention. This interpretation is supported by a GLUT4 translocation assay, which indicated that the V1316 variant is not as effective in GSC formation as the ancestral M1316 form of CHC22.
Thus, mixed lattices that would occur in heterozygous individuals might reduce GLUT4 sequestration compared to M1316 homozygotes, which would have the effect of improving glucose clearance. It could be argued that human consumption of carbohydrate on a regular basis, requiring increased glucose clearance, might be the selective force driving this genetic adaptation, consistent with the increased frequency of the V1316 variant in farmers. In the non-human primates studied, CLTCL1 allelic diversity was high but it was not possible to assess allele frequencies for all variants. While balancing selection in humans correlates with a human-specific activity such as farming, other primate species could be diversifying their CLTCL1 with the potential effect of diluting its function. Whilst chimpanzees are . CC-BY-NC 4.0 International license is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/307264 doi: bioRxiv preprint 20 omnivores, and gorillas herbivores, both rely on extensive foraging for carbohydrate for nutrition. It is also notable that polar bears, which have very low carbohydrate diets compared to their brown bear relatives, have distinct allelic variants of CHC22 with unknown functionality, again supporting the suggestion that CLTCL1 is undergoing selection driven by nutritional ecology. One possibility is that some forms of polar bear CHC22 are super-active at GLUT4 sequestration, providing a route to maintain high blood glucose, as occurs through other mutations in the cave fish (37).
Regulators of fundamental membrane traffic pathways have diversified through gene duplication in many species over eukaryotic evolution. Retention and loss can, in some cases, be correlated with special requirements resulting from species differentiation, such as the extensive elaboration of genes in the secretory pathway of tetrahymena (12,38). The evolutionary history of CLTCL1, following vertebrate-specific gene duplication, suggests that differentiation of nutritional habits has driven selection of the presence and absence of CLTCL1 in some vertebrate species, and its diversification in humans and potentially other species. Though its highest expression is in muscle and adipose tissue, transient expression of CHC22 during human brain development has also been documented (39). This was noted in a study of a very rare null mutant of CLTCL1 that caused loss of pain sensing in homozygotes and no symptoms for heterozygotes (39). Attenuated CHC22 function of the V1316 variant might lead to a spectrum of pain-sensing in humans but is unlikely to be a strong selective force affecting reproductive success, whereas glucose homeostasis, as suggested by our analysis is more likely. The presence of the CLTCL1 gene encoding CHC22 clathrin confers tight regulation on glucose uptake into muscle and fat tissue, which was likely beneficial to nutrition of the large human brain. However, in recent human history, readily available carbohydrate has increased our need to clear glucose from the blood, such that selection continues to act on CLTCL1. Our cell biology studies have demonstrated that CHC22 increases GLUT4 retention, so potentially could exacerbate insulin resistance through its accumulation on expanded intracellular pools of GLUT4 that we have observed in . CC-BY-NC 4.0 International license is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/307264 doi: bioRxiv preprint cases of insulin-resistant Type 2 diabetes (4). The genetic diversity that we report here may reflect evolution towards reversing a human tendency to insulin resistance. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/307264 doi: bioRxiv preprint input to generate a new phylogeny-aware MSAs with PRANK (48). Branch lengths of the reconciled topologies were then re-estimated based on the MSA generated by PRANK.

Phylogenetics
To compute evolutionary rates, the sequences and subtrees corresponding to CLTC and   (19). Additional frequency information for a single mutation of interest in more than 50 human populations was retrieved from the HGDP CEPH Panel (53) (56). Data filtering, functional annotation and haplotype phasing were performed as described above.
Full genome VCF files for two high-coverage archaic humans, namely one Neandertal Altai (57) and one Denisova were retrieved (58). Low-quality sites were filtered out using vcflib with the options "QUAL > 1 & DP > 10". A pseudo-reference sequence for each archaic human was constructed by replacing the heterozygous sites with the previously inferred human ancestral state. Sequencing data information for additional ancient human samples were obtained from previously published high-quality whole genome sequences (59)(60)(61)(62)(63)(64)(65)(66).
Genotype likelihoods were calculated using the standard GATK model (67). The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/307264 doi: bioRxiv preprint

24
The software popart version 1.7 (68) was used to generate median-joining haplotype network plots. Human samples were grouped according to either their population or assigned macro-population from the 1000 Genomes Project database. We used EMBOSS Seqret tool (http://www.ebi.ac.uk/Tools/sfc/emboss_seqret/) to convert files from fasta to nexus format. To assess whether the observed summary statistics are expected under neutral evolution, genes with a coding length approximately equal (+/-5%) to the one observed for the tested gene, CLTCL1, were selected. For this analysis, the longest isoform for each gene, and its annotation was considered according to refGene table from the UCSC Genome Browser.
We discarded genes on chromosome 6 and on sex chromosomes, as well as CLTA and CLTB and CLTC. This set was further reduced to ensure that the coding length of CLTCL1 The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/307264 doi: bioRxiv preprint plotting purposes, summary statistics and populations were clustered according to a dendrogram inferred from their respective distances based on the calculated matrix of empirical percentile rank. That is, populations clustering together exhibit similar patterns of percentile ranks, and thus of summary statistics. The underlying dendrograms are not reported. The heatmap plot was generated using the function heatmap.2 in R (73) with the package gplots. Cells with an empirical percentile rank lower than 0.1, 0.05 or 0.01 were denoted with a single, two or three asterisks, respectively.

Antibodies, plasmids and reagents
Mouse monoclonal anti-CHC17 antibody TD.1 (75) and affinity-purified rabbit polyclonal antibody specific for CHC22 and not CHC17 (4) were produced in the Brodsky laboratory.
Commercial sources of antibodies were as follows: mouse monoclonal anti-β-actin (clone AC-15, Sigma), mouse monoclonal anti-HA (clone 16B12, Covance). Secondary antibodies coupled to HRP were from ThermoFisher, the secondary antibody coupled to Brilliant Violet 421 was from BioLegend. The HA-GLUT4-mCherry was generated by replacing the GFP from the HA-GLUT4-GFP construct (gift from Dr Tim McGraw (76)) with mCherry using KpnI and EcoRI. The generation of the CHC22 variant expressing a valine at position 1316 (CHC22V) was previously described (77). The CHC22 variant expressing a methionine at . CC-BY-NC 4.0 International license is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/307264 doi: bioRxiv preprint position 1316 (CHC22M) was generated from CHC22V by quick-change mutagenesis (New England Biotechnologies, USA) following manufacturer's instructions.

Small RNA interference
Targeting siRNA was produced to interact with DNA sequences AAGCAATGAGCTGTTTGAAGA for CHC17 (77)  to 170 kDa (Thermo Fisher Scientific). Signals were detected using the Chemidoc XRS+ imaging system (Biorad) and quantifications were performed using Image J software (NIH).  for similarity threshold and sequence IDs). Based on the profile and species tree the most parsimonious phylogenetic tree edge for loss and duplication events is inferred and shown as red stars and blue squares respectively. We caution that the two gene losses inferred to have occurred in sharks may be assembly artefacts, as these two genomes have not undergone the same curation process as those genomes downloaded from the Ensembl database.   The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/307264 doi: bioRxiv preprint 30 against their corresponding empirical distribution based on >500 control genes. The resulting matrix was then sorted on both axes as a dendrogram (not reported) based on the pairwise distances between each pair of populations. As depicted in the color legend, red and yellow denote low and high percentile ranks, respectively. Asterisks indicate levels of significance (* p < 0.10, ** p < 0.05, *** p < 0.01). The overall distribution of rank percentiles is shown inside the color legend.       Table S1. Human alleles for the coding region of CLTLC1 extracted from the 1000 Genomes project data set. For each unique allele (hap_ID on the first column), the frequency in each macro-population (EUR, EAS, AMR, SAS, AFR) and in archaic humans and modern chimpanzees is reported, from column 2 to 9. From column 10 the nucleotidic sequence for each unique allele is reported for all 46 retrieved SNPs, indexed on the remaining columns. Table S2. Functional annotation for each coding polymorphism of CLTCL1 reported in Table   S1. Columns represent: chromosome, genomic position, SNP ID, reference allele, alternate allele, functional annotation for each isoform. Table S3. Archaic and ancient human M1316V genotypes. Table S4. Summary statistics of nucleotide diversity for CLTLC1 calculated for all analyzed human populations. For each population on the rows, several summary statistics are reported on the columns. The notation follows the one reported in the main text and NSAM is the number of sampled chromosomes.   Fig. 3B) for M1316V for all analyzed human populations. See legend for Table S6 for a description of its content. Table S8. Inferred CLTCL1 alleles for great apes. For each unique allele (hap_ID on the first column), the frequency in each species is reported. From column 13 the amino acid sequence for each unique allele is reported for the whole protein. Table S9. Inferred CLTCL1 alleles for chimpanzees and bonobos. For each unique allele (hap_ID on the first column), the frequency in each species is reported. From column 10 the amino acid sequence for each unique allele is reported for the whole protein.

Statistical analyses
. CC-BY-NC 4.0 International license is made available under a The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. It . https://doi.org/10.1101/307264 doi: bioRxiv preprint