Amylase copy number analysis in several mammalian lineages reveals convergent adaptive bursts shaped by diet

The amylase gene (AMY), which codes for a starch-digesting enzyme in animals, underwent several gene copy number gains in humans1, dogs2, and mice3, presumably along with increased starch consumption during the evolution of these species. Here we present evidence for additional AMY copy number expansions in several mammalian species, most of which also consume starch-rich diets. We also show that these independent AMY copy number gains are often accompanied by a gain in enzymatic activity of amylase in saliva. We used multi-species coalescent modeling to provide further evidence that these recurrent AMY gene copy number expansions were adaptive. Our findings underscore the overall importance of gene copy number amplification as a flexible and fast adaptive mechanism in evolution that can independently occur in different branches of the phylogeny.

apes led to the formation of AMY1 which gained salivary gland specific expression 8 . In the 48 human lineage, further gene copy number gains of AMY1, but not AMY2, led to increased 49 expression of the AMY1 enzyme in human saliva 1 . Copy numbers of amylase vary in different 50 human populations 9 and correlate with the extent of traditional starch consumption in these 51 communities dating back only 10,000 -20,000 years 1 . Despite all these gene copy number 52 gains, which are thought to be mediated by non-allelic homologous recombination 1 , the coding 53 sequences of the individual gene copies remained highly conserved. This suggests that 54 maintenance of function was adaptively relevant. 55 56 While the evolution of the amylase locus in the human lineage is well described, the evolution of 57 this locus in other mammals is less well understood. For example, it has been shown that mice, 58 rats, and pigs express substantial levels of salivary amylase 10 . However, the evolutionary 59 dynamics that led to gain-of-expression of amylase in saliva in these lineages remain unclear. 60 Another interesting question is the evolution of amylase in domesticated animals. Recent 61 studies have shown that dogs have also gained multiple copies of amylase after their split from 62 wolves within only the last 5,000 years, likely as a result of their domestication 2,11 . As such, the 63 evolution of amylase in other domesticated or human commensal mammals remains an alluring 64 area of inquiry. Similarly, our understanding of the evolution of the amylase locus within the 4 primate lineage remains limited. For example, it is not known why some Old World monkeys 66 express substantial amylase activity levels in saliva, despite missing the great ape specific 67 salivary amylase duplication 12 . 68

69
Here we address three areas of inquiry with regards to the evolution of the amylase locus in 70 mammals: (i) Can the link between diet and amylase evolution, well-established in the human 71 lineage, be generalized to other mammals? (ii) What are the evolutionary forces that shape 72 amylase copy numbers in mammals? (iii) What are the genetic mechanisms leading to salivary 73 expression in different nonhuman mammals? To answer these questions, we pursued a 74 comprehensive investigation of amylase gene copy number and salivary expression across 75 multiple mammalian lineages. 76 77 Results and Discussion: 78 Recurrent amylase copy number gains in multiple mammalian lineages. 79 The human-specific duplications of amylase are unique in their scope. Human genomes 80 comprise up to 5 more haploid copies than chimpanzees. Moreover, most of these additional 81 copies appear to contribute to expression of the amylase gene in saliva 1 . Therefore the recent 82 revelation that a similar, independent, increase in amylase copy number occurred in dogs 2 is 83 remarkable, since it shows that the same gene independently underwent bursts of gene copy 84 number gains in two separate species. To investigate whether these amylase copy number 85 gains occur in other mammalian lineages as well, we conducted a digital droplet polymerase 86 chain reaction (ddPCR) based analysis on amylase gene copy numbers from 153 DNA samples 87 across 44 species encompassing all major branches of the mammalian phylogenetic tree. In 88 5 addition to humans and dogs, we discovered similar bursts (i.e., gain of more than one copy) of 89 amylase gene copy number in house mice, brown rats, pigs, and boars (Figure 1, Table S1). 90 91 Given that copy number duplications occurred in different mammalian clades (Figure 1), we 92 hypothesized that these events are a result of convergent evolution. Another possible 93 explanation would be that the ancestor of placental mammals had multiple copies of the 94 amylase gene, which were subsequently lost in particular mammalian lineages. To distinguish 95 between these two scenarios, we constructed a maximum likelihood tree of amylase coding 96 sequences from available reference genomes (Figure 2A). Our results showed that amylase 97 genes within a given species are more similar to each other than they are to those of other 98 species, suggesting that the duplication of amylase genes occurred independently in each 99 lineage. 100 101 Samuelson et al. previously reported that a retrotransposon (HERV_a_int) was inserted 102 upstream of a new amylase gene duplicate (AMY1) in the ancestor of great apes 7 . This copy 103 rapidly duplicated several times in humans, carrying along the retrotransposon 1 . Based on this, 104 we asked if a similar signature accounts for the copy number burst found in the mouse genome. 105 We chose the mouse because its reference genome is adequately complete for such an 106 analysis. Indeed, we found a mouse-lineage-specific retrotransposon (L1Md_T) in the upstream 107 region of 5 out of the 7 mouse amylase genes. The presence of the retrotransposon along with 108 the duplicated copies parallels the situation in humans ( Figure 2B). Since different 109 retrotransposons accompanied the rapid gene copy number gains in humans and mice, we 110 conclude that these bursts occurred independently and, thus, are potentially a result of 111 convergent evolution. 112 113 6 By ddPCR analysis, we found 9-13 diploid copies of the amylase gene in brown rats (Table S1). 114 Considering the close phylogenetic relationship of rats and mice, we expected that the high 115 copy number of amylase had evolved in their rodent ancestor. However, the L1Md_T 116 retrotransposon is mouse-lineage specific. Therefore, the duplications in rats likely occurred 117 independently from those in mice. We also confirmed the previous observations that dogs have 118 gained at least 5 haploid copies of this gene over the short span of 5,000 years since their 119 divergence from the wolf 11 . A similar process can be predicted for the pig and boar, whose 120 genomes harbor 9-15 diploid copies of the amylase gene based on our analysis. In sum, our 121 results suggest that amylase gene copy number gains have occurred recurrently in multiple, 122 sometimes closely related, mammalian lineages. 123 Amylase expression in saliva was facilitated through recurrent gene copy 124 number gains independently in different mammalian lineages 125 Ancestral form of amylase in mammals codes for a pancreatic enzyme. However, in certain 126 mammalian species, amylase also became expressed in saliva 13 . In humans, this acquisition of 127 salivary gland-specific expression has been well documented 14 . It has been shown that the 128 aforementioned retrotransposon insertion along with the AMY1 duplicate in the ancestor of great 129 apes is responsible for tissue-specific expression of this gene in salivary glands 7 . Previous 130 studies also hypothesized that an independent, but similar gene duplication event led to the 131 salivary expression of amylase in mice 8 . It remains unresolved whether the mechanism that 132 enabled expression of amylase in mouse saliva is similar to that determined for humans. 133 Moreover, even though various reports showed salivary expression of amylase in different 134 mammalian species 12 , a comprehensive and systematic analysis of salivary expression of 135 amylase across the mammalian clade is still missing. 136 7 To fill these gaps in knowledge, we performed a screen across the mammalian phylogeny to 138 investigate which lineages express amylase activity in saliva. We used a two-pronged approach, 139 comprising a starch lysis plate assay ( Figure 3A) and a high-sensitivity in-solution fluorescence-140 based assay ( Figure 3B). This approach provides the most comprehensive documentation of 141 salivary amylase activity in mammals, encompassing 118 saliva samples across 20 species 142 (Table S1). This is a significant contribution given that previous studies varied considerably in 143 sample preparation, methods of analysis, and sensitivity 12 . 144

145
Our results showed that amylase activity in saliva is more widespread among mammals than 146 previously thought ( Figure 3B). In addition to species that were already known to express 147 amylase in their saliva, we observed salivary activity in boars, dogs, deer mice, woodrats, and 148 giant African pouched rats (Table S1). It is important to note here that our findings also suggest 149 that salivary amylase activity in dogs varies from breed to breed ( Figure S1, Table S1). 150

151
We surmised two competing scenarios to explain the observation that multiple mammalian 152 lineages express amylase in their saliva. First, there could be independent gains of amylase 153 expression in saliva spanning multiple lineages. Second, salivary expression of amylase could 154 be an ancestral trait that was subsequently lost in most species. The above-described 155 independent evolution of amylase gene copies in humans and mice supports the former 156 hypothesis. 157

158
To further investigate this, we asked which of the mouse amylase copies is expressed in 159 salivary glands by mapping available parotid salivary gland RNA-Seq data 15 to the mouse 160 reference genome (mm9) ( Figure S2). We found that the copy annotated as mouse AMY1 161 (Figure 2) is expressed in salivary glands, and is likely responsible for salivary expression of 162 amylase in mice, while the other amylase duplicates have a negligible expression in salivary 163 8 gland tissue ( Figure S2). Mouse AMY1 has an amino acid sequence distinct from the other 164 amylase copies in the mouse genome. This distinct sequence is shared with rats and other 165 rodents (e.g., deer mouse, vole, mongolian gerbil, golden hamster), indicating that the 166 duplication event that led to formation of AMY1 likely has occurred in an ancestor of muroidea. 167 168 Even though more work will be needed to understand the regulatory mechanisms through which 169 amylase gained salivary expression in pigs, boars, dogs, multiple rodents, and some Old World 170 monkeys, it seems gene duplication is the required initiating step. Indeed, we found that the 171 overall amylase gene copy numbers in species correlate well with observable enzymatic activity 172 in saliva ( Figure 3C). In fact, we could not find a species that underwent a "burst" of amylase 173 gene copy number that did not show concurrent salivary amylase activity. Importantly, previous 174 studies surmised that dogs do not express salivary amylase 2 , while we show here that several 175 dog breeds express substantial amounts of this enzyme ( Figure S1). This variable expression 176 of amylase in saliva among different dog breeds makes this species an ideal model to study the 177 mechanism of gain-of-expression in a new tissue facilitated by gene duplication. Overall, we 178 conclude that the salivary activity of amylase has recurrently evolved in multiple mammalian 179 lineages through gene duplication, where one or more of the duplicates have gained salivary 180 gland expression. 181 Varied diets correlate with increased amylase copy number 182 For humans, it has been postulated that starch consumption exerted a positive adaptive force 183 on maintaining high amylase copy numbers 1 . Furthermore, the rapid copy number increase in 184 dogs has been associated with their change in diet during domestication 2 . Based on these 185 previous studies, we hypothesized that gains in copy number and the associated gain of 186 amylase expression in saliva are likely driven by starch consumption. When we compared the 187 amylase copy numbers in mammals that consume specialized diets (strict carnivores and non-188 9 fruit eating herbivores) to those with broad-ranged diets, we found that the latter harbor 189 significantly higher copy numbers of the amylase gene (p=2.1x10 -7 , Mann-Whitney Test, 190 Figure 4A). We also found that the species consuming broad-ranged diets express significantly 191 higher salivary amylase activity than those consuming specialized diets (p=5.5x10 -4 , Mann-192 Whitney Test, Figure 4B). Test, Figure 4A). For salivary expression of amylase, this difference was not significant. This 205 could potentially be due to the fact that most, if not all the species that consume a broad-ranged 206 diet also consume starch to varying degrees. 207 208 Next, we conducted a comparative investigation of amylase copy number and its salivary 209 expression between human-interacting species and their closest evolutionary relatives in the 210 wild. In dogs, which due to their commensalism with humans consume a higher amount of 211 starch than wolves, we noted a substantial increase over its ancestral state, not only in amylase 212 gene copy number 2 , but also in salivary expression of amylase ( Figure 3C, Figure S1). This 213 increase was found less substantial in species that already consumed starch in their ancestral 214 state (e.g. mice and rats which are granivorous). Along the same lines, we found no difference 215 between domesticated pigs and wild boars. This could be explained because boars already 216 consumed starch in amounts comparable to those of pigs. In fact, previous observations 217 showed that boars and humans have similar starch-rich ancestral diets due to their consumption 218 of underground starch-containing storage stem tissues known as tubers 17 . 219 Evolution of amylase in primates 220 To understand how the broader trend of amylase evolution is reflected in the primate phylogeny, 221 we have investigated multiple primate species, both for amylase gene copy number and salivary 222 amylase activity (Figure 5). We confirmed previous studies which documented a duplication of 223 the amylase gene in the ancestral population of the catarrhini and another duplication in the 224 ancestral population of the great apes 8 . Among Old World monkeys, we found additional 225 amylase gene copies in rhesus macaques, baboons, and vervets. In contrast, we found no 226 additional gene duplication in leaf-eating old world monkeys (colobus, snub-nose and proboscis 227 monkeys) 18 . Most New World monkey genomes that we tested carry 4 diploid amylase copies. 228 Assuming that the ancestral state of this lineage had 2 copies, our results suggest another 229 instance of gene copy number gain in the ancestor of New World monkeys. Moreover, we found 230 an additional amylase copy in the capuchins, which consume more starch than other New World 231 monkeys 19,20 . Next, we investigated lemurs, an outgroup primate species to monkeys and great 232 apes, and found that they indeed only harbor 2 diploid copies of the amylase gene (Figure 5). can be found in Table S2. The diet information for individual species was mostly acquired from 316 Michigan Animal Diversity Web (https://animaldiversity.org/), unless other more specific studies 317 were cited. 318 319 Genomic analysis 320 DNA was isolated from buccal swabs and saliva using a commercially available kit 321 (ChargeSwitch® gDNA Buccal Cell Kit, Invitrogen). DNA extraction from blood and cell lines 322 was conducted as described previously 31 . The DNA was analyzed by digital droplet PCR 323 (ddPCR) to determine amylase gene copy number. For primer design we targeted amylase 324 exonic sequences that are conserved among copies and between species. The primer sets 325 used for each species are listed in Table S3. In most species, ddPCR results were highly 326 concordant with copy number estimations based on BLASTx and BLASTp analysis ( Figure S4). 327 Only in certain species, disparities between our ddPCR results and existing databases were 328 noted (Table S1, Figure 3C). 329 330

Phylogenetic analysis 331
Amino acid sequences translated from reference genomes for the amylase gene copies were 332 downloaded from NCBI. Sequences were aligned and a phylogenetic output was generated 333 using a custom Python code as described previously 32 . We constructed a maximum likelihood 334 tree from the protein sequences using RAxML 33 , bootstrapping with 1000 replicates for branch 335 support. Visualization was performed using FigTree 34 . 336 337

Measurement of amylase enzymatic activity 338
We used two methods to measure salivary amylase activity. First, we conducted a direct 339 measurement of enzyme activity using a starch lysis agar plate ( Figure 3A) following a 340 previously described protocol 35 . In parallel, we used a high-sensitivity (detection limit 2 x 10 -3 341 U/ml) microtiter plate assay (EnzCheck Ultra Amylase Assay Kit, Invitrogen) following the 342 manufacturer's protocol and using α-amylase from human pancreas (Sigma) as the standard . for a single copy, a reasonable assumption is that a gain should occur more frequently than a 359 loss. Such assumptions related to the neutral copy number evolution result in a dependence of 360 the mutation rate on the pre-existing copy number state. Thus, we implemented a modified 361 version of CoMuS, where genealogies are simulated first, and thereafter mutations occur along 362 the branches using a pre-order traversal of the tree: each mutation may affect the mutation rate 363 on each subtree that has inherited it. We simulated neutral copy number variants for a total of 364 16 300 individuals, that is 3 individuals for each of the 100 species of the guide phylogenetic tree 365 ( All the input data are provided in Tables S1 and S4. We used custom scripts to analyze data 370 and produce the figures primarily using the R statistical package.  Table S1 for breeds). Y-axis represents the 445 salivary enzymatic activity for the same sample. A trendline was applied to show correlation. 446 Red dots represent individual dog sample. 447