An epigenetic aging clock for dogs and wolves

Several articles describe highly accurate age estimation methods based on human DNA-methylation data. It is not yet known whether similar epigenetic aging clocks can be developed based on blood methylation data from canids. Using Reduced Representation Bisulfite Sequencing, we assessed blood DNA-methylation data from 46 domesticated dogs (Canis familiaris) and 62 wild gray wolves (C. lupus). By regressing chronological dog age on the resulting CpGs, we defined highly accurate multivariate age estimators for dogs (based on 41 CpGs), wolves (67 CpGs), and both combined (115 CpGs). Age related DNA methylation changes in canids implicate similar gene ontology categories as those observed in humans suggesting an evolutionarily conserved mechanism underlying age-related DNA methylation in mammals.


INTRODUCTION
Technological breakthroughs surrounding genomic platforms have led to major insights about age related DNA methylation changes in humans [1][2][3][4][5][6][7][8][9]. In mammals, DNA methylation represents a form of genome modification that regulates gene expression by serving as a maintainable mark whose absence marks promoters and enhancers. During development, germline DNA methylation is erased but is established anew at the time of implantation [10]. Abnormal methylation changes that occur because of aging contribute to the functional decline of adult stem cells [11][12][13]. Even small changes of the epigenetic landscape can lead to robustly altered expression patterns, either directly by loss of regulatory control or indirectly, via additive effects, ultimately leading to transcriptional changes of the stem cells [14].
Several studies describe highly accurate age estimation methods based on combining the DNA methylation levels of multiple CpG dinucleotide markers [15][16][17][18]. We recently developed a multi-tissue epigenetic age estimation method (known as the epigenetic clock) that combines the DNA methylation levels of 353 epigenetic markers known as CpGs [17]. The weighted average of these 353 epigenetic markers gives rise to an estimate of tissue age (in units of years), which is referred to as "DNA methylation age" or as "epigenetic age". DNA methylation age is highly correlated (r=0.96) with chronological age across the entire lifespan [8,19,20]. We and others have shown that the human epigenetic clock relates to biological age (as opposed to simply being a correlate of chronological age), e.g. the DNA methylation age of blood is predictive of all-cause mortality even after adjusting for a variety of known risk factors [21][22][23][24][25]. Epigenetic age acceleration (i.e. the difference between epigenetic and chronological age) is associated with lung cancer [26], cognitive and physical functioning [27], Alzheimer's disease [28], centenarian status [25,29], Down syndrome [30], HIV infection [31], Huntington's disease [32], obesity [33], menopause [34], osteoarthritis [35], and Parkinson's disease [36]. Moreover, we have demonstrated the human epigenetic clock applies without change to chimpanzees [17] but it no longer applies to other animals due to lack of sequence conservation.

AGING
Many research questions and preclinical studies of antiaging interventions will benefit from analogous epigenetic clocks in animals. To this end we sought to develop an accurate epigenetic clock for dogs and wolves. Dogs are increasingly recognized as a valuable model for aging studies [37,38]. Dogs are an attractive model in aging research because their lifespan (around 12 years) is intermediate between that of mice (2 years) and humans (80 years), thus serving as a more realistic model for human aging than most rodents. Dogs have already been adopted to model multiple human diseases in gene mapping studies (e.g. squamous cell carcinoma [39], bladder cancer [40]) and cancers are often the cause of age-related mortality in domestic dogs [41].
The maximum lifespan of dogs is known to correlate with the size of their breed [42][43][44]. Based on previous studies in human [17], we expect that the age acceleration (difference between epigenetic age and chronological age) correlates with longevity. We hypothesize that dogs whose epigenetic age is larger than their chronological age are aging more quickly, while those with negative value are aging more slowly. Thus, we would expect to see a correlation between age acceleration and dog breed size.
We also sought to build an epigenetic clock for gray wolves because alternative age estimation methods have limitations. Gray wolf age estimates have traditionally been conducted through tooth wear patterns, cranial suture fusions, closure of the pulp cavity, and cementum annuli [45,46]. Based on tooth wear patterns, the age structure of a wolf pack is typically skewed towards younger animals (<1-4 years old), with few individuals >5 years of age [46,47]. Sexually maturity is reached between 10 months and 2 years of age [48,49]. In a wild social carnivore, group living often results in high mortality rates. Gray wolves live on average 6-8 years in natural populations, but can live up to 13+ years in captivity with increased reproductive success [45,46].

Data set
We used Reduced Representation Bisulfite Sequencing to generate DNA methylation data of 46 domestic dogs (26 females, 20 males) and 62 gray wolves from Yellowstone National Park (26 females, 36 males). The age distribution of wolves is skewed towards younger animals (Dogs: mean=5 years, median=4, range=0.5-14; Wolves: mean=2.7, median=2, range=0.5-8) due to younger mortality rates in natural populations compared to domestic species, and that estimating the age in wild specimens lacks precision. Additionally, we included 729 humans (388 females, 341 males) with a large age range (mean=47.4, range=14-94).
Based on calculations and criteria described in the Methods section, we constructed a matrix of high confidence methylation levels across 108 canid blood samples. Previous work has shown that there are locusspecific significant methylation differences between dogs and wolves [50]. Here, however, we sought to identify a clock that correlated with age across both canid species; thus, we removed the methylation sites that showed species-specific divergence. This yielded a set of 252,240 CpG sites for our modeling efforts. Of these, 105,521 could be mapped to syntenic CpGs in the human genome (hg19) for functional annotation purposes. Further, a subset of 9,017 sites are measured by the human Illumina 405K array, which allowed us to test for conservation of age correlations between these evolutionarily divergent species (humans, dogs, and wolves).
From these input sets of 10s to 100s of thousands of CpGs, regression models were obtained using an algorithm (see Methods) that selects a much smaller number of CpGs by allowing regression coefficients to go to zero. As the space of possible models is combinatorially vast, there is no guarantee of global optimality of the resulting models, and there are likely a large number of models that would yield comparable results. Thus, we make no assertions of biological significance for the exact identity or number of CpGs in a given model used here.

Conservation of age-correlated methylation between dogs and wolves
To initially gauge whether it might be possible to create a DNAm age clock for a multi-species group (i.e. canids), we looked at the conservation of age-correlated methylation in the two canid species. The global correlation between the age effects across the two species is small in magnitude (r=0.07, Fig. 1A) which could be due to the following reasons: i) it could reflect poor accuracy of the chronological age estimate in wolves, ii) it could reflect the relatively small sample size, iii) it could reflect that wolves tended to be younger than dogs in our study, i.e. the chronological age distributions differed.

Conservation of age-correlated methylation between canid species and human
To test for more distant evolutionary conservation of age effects on DNA methylation between canids and humans, we computed age correlations over a set of 729 human blood methylation array samples [6] and AGING examined syntenic locations between the canine (canFam3) and human (hg19) genomes as described in Methods. While the subset of measured DNAmethylation sites common to all 3 species is relatively small (~9000 CpGs), we see that the conservation of age-correlation between "canids" (pooled samples of dogs and wolves) and human is statistically significant, though small in magnitude (r=0.20, p=1×10 -81 , Fig. 1B). This conservation holds for dogs alone (r=0.20, p=6×10 -85 ) but is weaker for wolves alone (r=0.11, p=1×10 -25 , Fig. 1C, 1D).
The high correlation between dogs and humans is remarkable because the two data sets were generated on different platforms (RRBS versus the Illumina 450K array).
Leave one out estimate of the accuracy of the canid epigenetic clock DNAm age (also referred to as epigenetic age) was calculated for each sample by regressing an elastic net on the methylation profiles of all other samples and predicting the age of the sample of interest. In the course of our work, we found that pre-selecting subsets of CpGs was helpful and computationally expedient. This was done by computing correlations between methylation and age and taking only those with absolute correlation above 0.3. These pre-selection steps were also performed in a leave-one-out manner for all crossvalidated results presented here. These predictions (in years) were obtained by taking the exponential of the output of the epigenetic aging model where ages were AGING log-transformed prior to regression. We see a strong linear relationship between DNAm age and true age for our 108 canid samples ( Fig. 2A). The correlation between predicted and actual ages using leave-one-out cross-validation was 0.8 and the median absolute error was 0.8 (years). The average number of CpGs in the 108 individual regression models was 122.3.
To examine the effects of pooling two species of canids, we performed the same prediction (DNAm age calcula-tion) procedure on dogs and wolves, separately. We find that the performance of these models is lower than the canid model, with dogs showing a correlation of r=0.65 and wolves r=0.54 (Fig. 2B). The average number of CpGs in the dog-only and wolf-only models were 58.5 and 62.9, respectively. These models, on average, contain fewer CpGs than the combined canid models as the smaller number of samples in each subset provides less statistical support for the regression algorithm. AGING As another means of assessing the robustness of a multi-species clock, we built one clock for each species using all samples in that species and then applied it to all samples in the other species. These clocks have similar correlation to the dog only or wolf only clocks, close to 0.6, utilizing a single regression model with 67 and 41 CpGs for the dog and wolf model, respectively (Fig. 2C).

Final epigenetic aging clocks based on all animals
To determine the accuracy of our final models, we regressed the penalized elastic net over the set of dogs (41 CpGs), wolves (67 CpGs), and then both combined (115 CpGs) (Fig. 2D). The penalized regression routine ("elastic net") utilizes an internal cross-validation to select the optimal penalty parameter. While the entire set of canids, and the subset of domesticated dogs could be fit exactly (r=1.0), the wolf data alone was slightly less amenable.

Age acceleration as a function of dog size
With the largest variation in size among terrestrial vertebrates, the domestic dog not only spends most of its life in an environment and lifestyle like its human companions, but also displays a high similarity of analogues to human disease [51,52]. Though dog breeds are diverse in nearly every aspect, smaller breeds are known to live longer than larger breeds [42][43][44]. Recent genomic surveys have identified nine loci linked to canine size determination, with seven of these loci supporting growth, cellular proliferation, and metabolism [53]. Of these, the growth hormone IGF1 has not only been of historic interest as a causative locus controlling body size in mice [54][55][56], but also has the most significant association with body size [57,58].  Genes with high-CpG-density promoters (HCP) bearing the H3K27 tri-methylation (H3K27me3) mark in brain.

AGING
We found a correlation of 0.25 between age acceleration and breed weight (Fig. 3). Given the limited sample size for dogs (n = 46) we did not reach a significance below the standard threshold of 0.05. However, we expect that a study with a larger cohort might have sufficient power to show that these trends are in fact significant.

Functional significance of DNAm age sites
As described in Methods, mapping of canid CpGs to the human genome yielded 105,521 sites. We utilized this entire set as "background" and selected subsets of CpGs based on the statistical significance of their correlation with age as "foreground". These subsets are not meant to correspond exactly to any of the particular regression models, but to capture the general association of agerelated CpGs (from which the regression models are drawn) and biological function inferred via proximity of the CpGs to known genes.
We also partitioned the CpGs into groups with positive (gain of methylation) or negative (loss of methylation) with age, as these two groups have been noted to correspond to separate classes of biomolecular function in previous work [17,59]. As negatively correlated sites generally partition to distal parts of gene bodies or intergenic regions, they tend to have limited annotation. Conversely, positively correlated sites localize to promoter regions of genes for which there is generally more detailed annotation. To ensure the selection of statistically significant age-related CpGs, we performed a multiple-testing correction [60] on the p-values and selected only those with adjusted values <= 0.05. The annotation tool (GREAT) accesses a large and diverse number of databases and function ontologies. Here, we report those results edited down to non-redundant highlights. We found that a subset of 91 negativelycorrelated CpGs (0.1% of total) localized to 125 genes that function in cellular organization and the Notch pathway, an evolutionarily conserved cell-to-cell signaling pathway important for cell proliferation and differentiation (Table 1A). The subset of 90 positivelycorrelated CpGs (0.1% of total) localized to 71 genes with vital roles in embryonic organismal development and chromatin states (Table 1B). In summary, the canid genes whose DNA-methylation changes are most strongly correlated to age (both negatively and positively) are critical developmental genes; those that determine cell fate and organ development in the embryonic stage of life, as has been noted in previous work with DNA-methylation in humans [17,59].

DISCUSSION
More broadly, our study demonstrates that DNAmethylation correlates with age in dogs and wolves as it does in human and related species. This age-dependence of DNA-methylation is conserved at syntenic sites in the respective genomes of these canid species as well for more distantly related mammalian genomes such as human. Strikingly, the age associations of syntenic CpGs is well conserved (r=0.20) even though the data were generated on different platforms (RRBS vs Illumina methylation array). Overall, our study demonstrates that dogs age in a similar fashion to humans when it comes to DNA methylation changes.
Race/ethnicity and sex have a significant effect on the epigenetic age of blood in humans [61]. Further, genetic loci have been found that affect epigenetic aging rates in humans [62]. It will be interesting to determine whether sex effects can also be observed in dogs and whether genetic background relates to the ticking rate of the canid clock. Based on our preliminary blood samples of 108 canid specimens, including both dogs and wolves, we accurately measured the methylation status of several hundred thousand CpGs. We demonstrate that these data can produce highly accurate age estimation methods (epigenetic clocks) for dogs and wolves separately. By first removing sites that were variable between dogs and wolves, we could also establish a highly accurate epigenetic clock for all canids (i.e. dogs and wolves combined). This clock allows us to estimate the age of half the canids to within a year.
Our study has several limitations including the following. First, the sample size was relatively low (n=108). There is no doubt that more accurate clocks could be build based on larger sample sizes. Second, we only focused on blood tissue. Future studies could explore other sources of DNA such as buccal swabs. Third, the chronological ages of the wolves are probably not very accurate since they were estimated by the investigators.
In human studies, we have found that lifestyle factors (e.g. diet) have at best a weak effect on cell-intrinsic epigenetic aging rates measured by the 353 CpG based clock [63]. By contrast, extrinsic measures of epigenetic age acceleration, which also capture age related changes in blood cell composition, relate to lifestyle factors that are known to be protective in humans (e.g. consumption of fish, vegetables, moderate alcohol, and to higher levels of education). Biomarkers of metabolic syndrome were associated with increased DNAm age but we could not detect a protective effect of metformin in this observational study [63]. The presented canid aging clocks open up the possibility of assessing dietary and pharmacological intervention on canid aging. The genome coordinates for the CpGs and corresponding regression coefficients of our final canid age estimator and of our dog age estimator can be found in Table 2 and Table 3, respectively. Genome coordinates and coefficient values for predicting a log (base e) transformed version of chronological age. These coefficients were found by regressing a log-transformed version of age on the RRBS DNA-methylation measured from 108 canid blood samples. Since chronological age was log-transformed prior to regression, it is important to exponentiate the age estimate from this model to arrive at age estimates in units of years. We provide the mean methylation and Pearson correlation with age for each individual CpG. Where possible, we identify, via synteny to the human genome, genes that are proximal to the CpGs in our models. Numbers in parentheses are the distance in bases to the Transcription Start Site of the gene. Additionally, we note those genes with experimentally inferred relevance to cellular identity (pluripotency).

AGING
Genes experimentally identified as targets of pluripotency factors and the Polycomb repressor complex [69,70]

Reduced representation bisulfite sequencing (RRBS)
We obtained previously published canine RRBS methylation data as CGmap files (see Janowitz, Koch, et al. 2016) [50]. Both wolf and dog data were aligned to the canine genome (canFam3).

Data processing
For each CpG site in each sample we estimated the methylation frequency as the number of methylated mapped read counts over the total mapped read counts and computed a corresponding 95% confidence interval from the binomial distribution [64]. For inclusion in our analysis, we required that each CpG site had confident methylation frequencies in at least 95% of samples. Confidence was defined as having a confidence interval smaller than 0.63 (roughly equivalent to requiring a minimum of 15 mapped reads at that site). For the remaining elements in the data matrix, we used the frequencies calculated regardless of confidence or imputed missing values using R package "softImpute" with type option "ALS" [65].

Culling species-specific differential methylation
To exclude species-specific differential methylation as a confounder, we first constructed a methylation matrix with no dog samples with ages greater than the maximum observed wolf age (8 years Genome coordinates and coefficient values for predicting a log (base e) transformed version of chronological age. These coefficients were found by regressing a log-transformed version of age on the RRBS DNA-methylation measured from 46 domesticated dog blood samples. Since chronological age was log-transformed prior to regression, it is important to exponentiate the age estimate from this model to arrive at age estimates in units of years. We provide the mean methylation and Pearson correlation with age for each individual CpG. Where possible we identify, via synteny to the human genome, genes that are proximal to the CpGs in our models. Numbers in parentheses are the distance in bases to the Transcription Start Site of the gene. Additionally, we note those genes with experimentally inferred relevance to cellular identity (pluripotency).
Genes experimentally identified as targets of pluripotency factors and the Polycomb repressor complex [69,70] a genes identified by ChIP on chip as targets of the Polycomb protein EED in human embryonic stem cells. b genes possessing the trimethylated H3K27 (H3K27me3) mark in their promoters in human embryonic stem cells, as identified by ChIP on chip c Polycomb Repression Complex 2 (PRC) targets; identified by ChIP on chip on human embryonic stem cells as genes that: possess the trimethylated H3K27 mark in their promoters and are bound by SUZ12 and EED Polycomb proteins d genes identified by ChIP on chip as targets of the Polycomb protein SUZ12 in human embryonic stem cells AGING

Regression
Penalized regression models were built using glmnet [66]. Given that we would like to see a reduction in the number of predictors from potentially hundreds of thousands of CpGs as input, we utilized the "elastic net" version of glmnet corresponding to an alpha parameter of 0.5. For all results reported here, the internally crossvalidated (cv.glmnet) was utilized to automatically select the optimal penalty parameter.

Functional annotation and multi-species synteny
Canid methylation sites (using coordinates from the CanFam3 draft genome) were first mapped to the human genome (hg19) where possible so that functional analysis tools with access to the most complete and detailed annotations could be utilized. This mapping was made using the "liftOver" tool and associated human to canine chain files available at the UCSC Genome Browser [67]. The human genome coordinates were then used as input to the Genomic Regions Enrichment of Annotations Tool (GREAT) [68].

CONFLICTS OF INTEREST
The Regents of the University of California is the owner of a provisional patent application directed at this invention for which the authors are named inventors.