Ageing transcriptome meta-analysis reveals similarities and differences between key mammalian tissues

By combining transcriptomic data with other data sources, inferences can be made about functional changes during ageing. Thus, we conducted a meta-analysis on 127 publicly available microarray and RNA-Seq datasets from mice, rats and humans, identifying a transcriptomic signature of ageing across species and tissues. Analyses on subsets of these datasets produced transcriptomic signatures of ageing for brain, heart and muscle. We then applied enrichment analysis and machine learning to functionally describe these signatures, revealing overexpression of immune and stress response genes and underexpression of metabolic and developmental genes. Further analyses revealed little overlap between genes differentially expressed with age in different tissues, despite ageing differentially expressed genes typically being widely expressed across tissues. Additionally we show that the ageing gene expression signatures (particularly the overexpressed signatures) of the whole meta-analysis, brain and muscle tend to include genes that are central in protein-protein interaction networks. We also show that genes underexpressed with age in the brain are highly central in a co-expression network, suggesting that underexpression of these genes may have broad phenotypic consequences. In sum, we show numerous functional similarities between the ageing transcriptomes of these important tissues, along with unique network properties of genes differentially expressed with age in both a protein-protein interaction and co-expression networks.

AGING to analyse the results. This gave an ageing signature consistent with previous analyses. In addition, we performed analyses on tissue-specific subsections of these datasets for brain, heart and muscle revealing some novel tissue specific differences in network connectivity.

Most significant ageing gene expression signatures
The global meta-analysis across various tissues in mice, rats and humans identified 449 genes overexpressed with age and 162 underexpressed with age. This is considerably more than the results of de Magalhães, et al. [4], where 56 overexpressed and 17 underexpressed genes were identified. For the tissue-specific analyses, in brain 147 genes were overexpressed and 16 genes were underexpressed, in heart 35 genes were overexpressed and 5 genes were underexpressed, and in muscle 49 genes were overexpressed with 73 genes underexpressed. The top-5 overexpressed genes for each analysis are presented in Table 1 and the top-5 underexpressed genes for each analysis are presented in Table 2.
The most significantly overexpressed genes in this meta-analysis were principally involved in immune responses and inflammation, particularly for the global and the brain-specific analyses. Several complement proteins were overexpressed in these analyses, with C1QA appearing at the top of both the global and brainspecific analyses, C1QC likewise appears in both lists. The top genes in the heart-specific results include the structural protein gene MGP, genes involved in amine metabolism and oxidation-reduction processes (MAOA and VAT1) as well as the iron and copper metabolism gene CP. In muscle the top overexpressed gene was CDKN1A, a cell cycle regulator. Other interesting genes overexpressed in muscle include EFEMP1, a gene involved in eye morphogenesis that has demonstrated involvement in premature-aging like phenotypes in mice, possibly playing a role in fascial structural integrity [6], and that has recently been shown to be overexpressed in aged mouse aorta [7] and CHRNA1 that codes for a muscle acetylcholine receptor subunit.
A common theme across the top underexpressed genes is mitochondrial metabolism. In the global results, the top underexpressed gene is UQCRFS1, a subunit of mitochondrial complex III, while in heart NDUFS7, a component of mitochondrial complex I, is the second most significantly underexpressed gene. Another mitochondrial complex I subunit, NDUFC1 was the third most significantly underexpressed gene in muscle. The brain is the only tissue studied that did not see an underexpression of mitochondrial genes. Indeed, all the top-5 genes underexpressed in the brain signature have clear roles in neuronal signalling and/or development. Complete lists of all significant genes for all the analyses can be found in Supplementary Tables 3-10,  while intersections between the results from each  analysis can be found in Supplementary Table 27.
Interestingly, several genes with known involvement in ageing-modulating pathways were differentially expressed, for instance IGF1 was underexpressed, while IGF2R and RICTOR were overexpressed in the global meta-analysis.

Comparison with GenAge signature
The results from the complete meta-analysis were first compared to the results from the 2009 microarray metaanalysis available on the GenAge database [4]. These two meta-analyses used similar methods, and this new analysis identified 66% and 56.3% of the genes identified previously for over-and underexpressed categories respectively. The overlap for each class of differential expression (over-and underexpressed) between this and the previous meta-analysis are shown in Figure 1.
There was significant overlap between these results and the GenAge signature for both over-( Figure 1A) and underexpressed ( Figure 1B) genes (hypergeometric test, p<1e-10 for both comparisons), expected given the large overlap of studies included in both analyses.
Further, the overlap between the global and tissuespecific analyses was tested for overexpressed and underexpressed genes separately using pairwise hypergeometric tests (Bonferroni corrected). The overlaps between the analyses are shown in Figure 2.
For overexpressed genes ( Figure 2A) there was significant overlap between the global analysis and all three tissues (hypergeometric test, p<1e-10 for all comparisons). The brain analysis also overlapped significantly with the heart (hypergeometric test, p=1.43e-2) and muscle (hypergeometric test, p=3.17e-3).
For underexpressed genes ( Figure 2B) the global analysis only significantly overlapped with the brain (hypergeometric test, p=1.44e-8) and the muscle (hypergeometric test, p<1e-10) analyses. No other overlaps were significant.
For both over-and underexpressed genes, there were no genes differentially expressed in all four analyses, nor in both heart and muscle.

Overlap with other ageing databases
In addition to the GenAge ageing expression signature, this meta-analysis was compared to other gene lists AGING The value given between brackets in the 'p-value' column header is the p-value threshold at which FDR <0.05.
hosted on the Human Ageing Genomic Resources (HAGR). These were the GenAge database of genes suspected to be involved in human ageing [8], the GenDR database of genes differentially expressed with dietary restriction in model organisms [5] and the LongevityMap database of human genes with genetic variants associated with longevity [9].
There was a significant overlap of the genes differentially expressed with age in the complete metaanalysis with both human GenAge genes and the genes with longevity associated variants found in LongevityMap, however there was no overlap with the dietary restriction signature from GenDR, or the human homologues of mouse genes that can modulate longevity in either direction (Table 3).

Functional classification analysis
The detected ageing expression signature was tested for GO enrichment, in addition to the use of data mining methods to identify the most important GO terms that could be used in the assignment of each gene to a differential expression class. The purpose of this dual analysis was to provide functional descriptions from two very different methods, hopefully providing a robust description of functional changes with age.
GO enrichment analysis was performed for each metaanalysis (global, brain, heart, muscle) on the over-and underexpressed expression signatures separately and the significantly enriched GO terms were ranked by p-value (Supplementary Tables 11-18).
The machine learning analysis was likewise conducted on each tissue, and the GO terms determined to be predictive of each expression class (overexpressed, underexpressed or unchanged) were ranked in terms of descending average probability (Supplementary Tables 19-26).
To provide a comprehensive picture of the functional changes associated with the ageing expression AGING Collagen type XV alpha 1 chain 9.30e-7 CKMT2 Creatine kinase, mitochondrial 2 9.30e-7 The value given between brackets in the 'p-value' column header is the p-value threshold at which FDR <0.05. AGING signature, top-ranked terms that overlap between these two analyses are presented below for GO terms associated with overexpressed (Table 4) and underexpressed genes (Table 5), for each tissue. The criteria for inclusion in these tables is that the term was significantly differentially expressed (p<0.05) and present in the top-20 terms for the prediction of the given expression class. The data mining precision was prioritised over enrichment significance, and so they have been ranked in the following tables according to their precision value. Note that although many of the precision values for the top-ranked terms are relatively low, they are much higher than the class label's relative frequency (given in the column header), which is the precision that a classifier would get by randomly classifying the genes.
Terms describing the overexpressed genes were predominantly related to immune responses; for instance, "Regulation of dendritic cell differentiation" was the best predictor of overexpression in both the global and brain analyses, with an average precision of 0.613 and 0.298 respectively, while also being significantly enriched in both cases. Likewise, "Complement activation, classical pathway" another immune term was highlighted in both these analyses, while in brain "Positive regulation of podosome assembly" and "Negative regulation of leukocyte differentiation" were both identified strongly by both analysis methods.
Another theme amongst the overexpressed genes that crosses tissues is cellular response functions, particularly in relation to stress, for instance terms raised by the global analysis include "Cellular response to cadmium ion" and "Cellular response to zinc ion", while in heart "Cellular response to vitamin" and "Iron ion homeostasis" were identified, and finally in muscle "Positive regulation of reactive oxygen species metabolic process" was determined to be of interest.
Terms describing the underexpressed genes were less precise and in lower number than those describing overexpressed genes due to the lower numbers of underexpressed genes overall (excepting muscle). The global meta-analysis is dominated by metabolic and developmental terms, with the metabolic theme being shared with muscle (e.g. "Oxaloacetate metabolic process" was considered important in both) while the developmental theme was shared with the brain. Interestingly, the machine learning and enrichment analyses shared little specific agreement regarding genes underexpressed in the brain, with only two terms being agreed on as interesting by both methods, this is likely due to the low number of genes underexpressed in the brain (16).

Tissue specificity of the ageing transcriptome
To determine if there was an association between tissue specificity and the ageing expression signature, the τ index of tissue specificity was calculated for all genes studied in the meta-analysis, using the expression data from the GTEx project. This yielded a bimodal distribution of gene specificity, typical of this measure (Supplementary Figure 2).
There was a weak negative association detected between differential expression with age and high  The overlap shown includes all differentially expressed genes from the expression datasets, regardless of the direction of expression change (611 genes total). Comparisons made are with the GenDR dietary restriction expression signature, the human entries of GenAge which includes genes for which evidence exists of their involvement in ageing, human homologues of genes shown to be pro-or anti-longevity in mice, and genes with longevity associated variants from LongevityMap. Presented here are a selection of terms for each tissue which were both significantly enriched in the given gene list and present in the top-20 terms, ranked by precision, for the prediction of a gene as being overexpressed by the Random Forest model. The value given between brackets in the Precision column header is the class label's relative class frequency, i.e. the precision that a classifier would get by randomly classifying the genes. Presented here are a selection of terms for each tissue which were both significantly enriched in the given gene list and present in the top-20 terms, ranked by precision, for the prediction of a gene as being underexpressed by the Random Forest model. The value given between brackets in the Precision column header is the class label's relative class frequency, i.e. the precision that a classifier would get by randomly classifying the genes. It should be noted that the list of genes underexpressed in the heart was too small for a meaningful analysis and so has been left out.
Betweenness centrality in the co-expression network (Supplementary Figures 11B-14B, 16) was only changed in the brain signature, where, as with degree, the underexpressed genes (median=3.98e-4) had a higher betweenness than unchanged genes (p=0.034, median=7.88e-4), although in this case there was no significant difference between over-and underexpressed genes.

Evolutionary conservation of ageing signature genes
There were no significant differences between dN/dS ratios (the ratio of nonsynonymous to synonymous substitutions between the species) of genes over-or underexpressed with age when compared to either unchanged genes or to the opposite expression category, for either human-mouse or human-rat ratios (Supplementary Figures 18, 19). The median values tended towards a lower dN/dS in those genes underexpressed with age relative to those overexpressed with age, with the median dN/dS being 0.096 and 0.093 in underexpressed genes and 0.12 and 0.11 in overexpressed genes for human-mouse and human-rat comparisons, respectively.

DISCUSSION
There was a significant overlap between this metaanalysis and the results of de Magalhães, et al. [4] ( Figure 1) for both over-and underexpressed genes. This overlap, although significant, is not as extensive as might have been expected, potentially due to the differing biases in microarray and RNA-Seq results [10], or the heterogeneity demonstrated in expression patterns of the mammalian immune response [11]. Despite this, the functional themes of the detected genes were much the same with overexpressed genes being broadly immune and underexpressed genes being broadly metabolic.
Enrichment analysis was coupled with data mining to identify GO terms that robustly describe the processes associated with the altered genes. Examining the topranked GO terms that these methods agreed on (Tables 4 and 5) reveals some interesting differences and similarities between the studied tissues. The global analysis of 127 datasets is typical of previous largescale expression studies and meta-analyses [4,12,13], showing overexpression of immune genes, stress responses and proteolysis (Table 4A), as well as underexpression of metabolic and energy metabolism. The preponderance of inflammatory and stress response genes in particular is reminiscent of the inflammageing hypothesis [14], which argues that ageing is caused by steadily failing responses to stress, in particular responses to the increased antigenic load that comes with age. Coupled with the overexpression of immune and inflammatory genes, the underexpression of metabolic genes is implicated not just in ageing, but in several ageing-related diseases for instance Alzheimer's [15] and Duchenne muscular dystrophy [16].
A similar profile was seen in the brain with immune categories dominating the top-ranked terms, including "Regulation of dendritic cell differentiation", which was also the most predictive GO term of overexpression with age in the global analysis. There is some evidence suggesting a causative role of immune processes in brain ageing, for example astrocytosis, abnormal proliferation of the cells responsible for (among many other functions) regulation of inflammation in the central nervous system [17] is associated with loss of myelin in Alzheimer's disease, Parkinson disease and ageing [18]. It is possible that changes between different brain regions exist that could not be detected due to the study design. Indeed, different regions of the brain do appear to suffer age-related decline at different rates [19].
Differential ageing between tissues was seen in the other analyses as well, and it is unclear to what extent tissues age at the same rate. Epigenetic measures have shown some minor differences in the rate of ageing between breast and other tissues [20], and environmental effects accelerate age-related changes in exposed tissues, for instance skin ageing is influenced AGING by smoking [21] and air pollution [22]. The extent to which such changes can be considered increases in the rate of ageing are suspect however [23], it could simply be that extrinsic stressors cause damage similar to that of ageing. The data presented here suggest some differences in transcriptomic ageing between tissues, particularly between the overexpressed signatures of the brain and the heart/muscle, with the brain showing changes in immune categories while the heart and muscle show changes in local homeostasis and protein catabolism (Table 4).
These categories are consistent with previous analyses of ageing transcription signatures. de Magalhães, et al. [4] likewise identified several overexpressed immune and xenobiotic terms, with metabolic terms being enriched in the underexpressed genes; while the more recent GTEx consortium analysis of human ageing has also reported that genes underexpressed with age in multiple tissues are consistently enriched for metabolic, in particular mitochondrial, GO terms [12].

An
interesting result was the significant underexpression of some immune genes (MLF1, FKBP4) in the meta-analysis (Table 2A). Dysregulation of the immune system may in part explain why the immune response becomes less effective with age, indeed old mice have been shown to have increased heterogeneity of transcriptional response to immune stimulus in their CD4 + T cells, with results suggesting that they are less able to upregulate adaptive response programs when necessary [24].
Of the other HAGR databases tested, GenDR and the longevity modulating mouse genes from GenAge did not show a significant overlap (Table 3). This is possibly due to the inclusion of human data in this meta-analysis, whereas the dietary restriction signature hosted on GenDR is based on mouse, rat and pig [5], and the longevity modulating mouse genes may not always be transferable to other species, or necessarily be differentially expressed. Alternatively, although dietary restriction slows ageing, it may do so by affecting pathways that are not commonly altered with age and that perhaps modulate ageing at a deeper level. While there is evidence that dietary restriction is able to reverse many ageing transcriptional changes [25,26], it appears that the lifespan extension may be caused by an upregulation of stress responses and repair mechanisms [27] and thus dietary restriction may combat ageing by improving defenses to ageing-related damage, rather than altering the ageing processes themselves. Additionally, dietary restriction may weaken the adaptive immune system in aged organisms [28], whereas the opposite might be expected if it were simply reversing or slowing ageing processes.
The significant overlap between the ageing expression signature and both GenAge and Longevity Map is interesting because the genes recorded in those databases are genes with either evidence of involvement in ageing or genes with genetic association to longevity, neither of which would necessarily be expected to be altered with age. One caveat is that a large number of immune genes were identified in these expression signatures, and several of the largest contributing studies in LongevityMap were explicitly studying variation in immune genes and how it affects ageing, as such LongevityMap would be expected to skew towards immune and inflammation genes.
These data suggest the most detectable ageing expression changes are those that occur in genes expressed across tissues, with a weak negative association observed between genes being tissue specific (τ>0.8) and being differentially expressed with age for overexpressed genes in the global and brain analyses, and underexpressed genes in the global and muscle analyses (Supplementary Table 2). This result is corroborated by other studies, for instance in mice genes differentially expressed with age tend to be differentially expressed across multiple tissues, although gene expression changes in some tissues, for example the liver, do tend to be more tissue-specific [29]. Further, the AGEMAP project was able to cluster tissues into three modes of ageing: neural, vascular and steroid responsiveness [30]. This suggests that while there may be distinct ageing transcriptional profiles between tissues, there are sets of tissues which age by similar mechanisms, with similar expression changes. It should be noted that the nature of this meta-analysis means that only the most consistently differentially expressed genes were detected. As such there is potentially a bias towards genes that are both highly expressed and expressed across tissues, since these will have been detected in more studies.
Interestingly, while the underexpressed signatures focused on metabolic and developmental genes, both heart and muscle showed distinct overexpressed signatures relative to the similar profiles observed in the global and brain analyses. The heart, for instance, shows a focus on cellular responses including to vitamin and iron homeostasis (Table 4C). Iron homeostasis deregulation with age has been shown to occur in several tissues and is a possible driver of oxidative stress in aged tissues, with the activation of iron detoxification proteins being a possible adaptive measure to such changes [31]. The muscle shows overexpression of cell-cycle mediators (Table 4D), which while typically associated with cellular senescence and the prevention of cancer, are also involved in the repair of DNA damage, apoptosis, autophagy, immune responses and metabolism [32].
AGING Indeed, apoptosis in skeletal muscle may be one of the causes of fiber loss that results in sarcopenia [33].
Considering the PPI network, the higher degree centrality of genes differentially expressed with age in most tissues is not especially surprising. Several of the identified genes are well studied and PPI data favours proteins of high abundance [34] and with high publication coverage [35]. Despite this, coupling the higher degree centrality with the higher betweenness centrality seen in the same tissues ( Supplementary  Figures 8, 9), and the higher closeness centrality seen in differentially expressed genes from the global analysis (Supplementary Figure 10) there is evidence that genes differentially expressed with age tend to be highly connected within PPI networks, suggesting possible regulatory roles and are thus potential bottlenecks to the flow of information through the network, making them interesting targets for intervention to study the regulation of these networks.
In the co-expression network, degree centrality (Supplementary Figure  15) was lower in underexpressed genes in the global and muscle analyses, yet it was higher in underexpressed genes in the brain analysis. This trend was mirrored by betweenness centrality (Supplementary Figure 16), which was higher in genes underexpressed in the brain despite not being changed in any other signature. Likewise, while closeness centrality tended to be lower in both over-and underexpressed genes across the analyses (Supplementary Figure 17) it was higher in genes underexpressed in the brain. The high centrality of both over-and underexpressed genes in the PPI network, but particularly the high centrality of the underexpressed brain genes in the co-expression network, is interesting since high centrality in biological networks can indicate importance in disease with highly central genes potentially having dramatic or even lethal effects when targeted [36]. Further, co-expression in the brain is disrupted by diseases such as Alzheimer's disease [37], making these genes potentially important in the pathogenesis of aging brain disease.
To summarise: 1) the ageing expression signature in humans, mice and rats can be predominantly described as an overexpression of genes associated with immune, stress and proteolytic processes coupled with an underexpression of genes associated with metabolic, particularly mitochondrial, and development processes; 2) genes differentially expressed with age tend to be more highly connected in the protein-protein network, particularly in the global and brain signatures; 3) genes underexpressed with age in the brain are highly central in the co-expression network, suggesting these underexpressed genes may have significant effects and, we hypothesize, play a role in cognitive ageing and; 4) the most detectable genes differentially expressed with age tend to be expressed across a broad range of tissues. We provide the differential expression results used in the meta-analysis (Supplementary Datasets), along with the tau scores (Supplementary Table 2) of tissue specificity calculated from the GTEx database as a resource for the community. These data will be most useful as a validation dataset, reflecting as they do the most commonly observed genes differentially expressed with age, however they may also prove useful for further discovery, for instance as features for further data mining studies, combining these annotations with other databases or fresh experimental data.

Preparation of the dataset
In total, 127 datasets were downloaded from AGEMAP [30] and the Gene Expression Omnibus (GEO) [38] (Supplementary Table 1), covering a total of 37 tissues and cell types. AGEMAP contains the results of microarray experiments on mice at various ages, while the GEO datasets downloaded were identified using the search string: "(("age"[Subset Variable Type]) or "development stage"[Subset Variable Type]) and "mammals" [organism]", returning 335 microarray and RNA-Seq datasets. These were manually filtered to remove non-single channel arrays, single-pathway arrays as well as species that were not of interest. Mutant or diseased samples were likewise removed. Next, RNA-Seq datasets containing raw reads were normalised as reads per kilobase million (RPKM), and all datasets were log2 transformed, if they were not supplied so already.
Linear regression was carried out on each dataset to determine differential expression with age (Equation 1) where Yij is the expression level of gene j in sample i, Agei is the age at which sample i was taken and ϵij is the error term. Coefficients β0 and β1 were estimated by least squares, and significance was calculated using an F-test.
A cumulative binomial test was then used to identify genes that were significantly differentially expressed across the datasets, taking the probability of success as the probability that any gene was not over-/ underexpressed in any dataset, the number of trials as the number of datasets in which the given gene was detected, and the number of successes as the number of AGING datasets in which the given gene was not detected as significant. Thus the test asks, "for a gene; given the number of times a gene was tested across all the data sets, the number of times a gene was significantly differentially expressed across all the data sets, and the probability of seeing any gene differentially expressed, what is the probability that this gene is differentially expressed more than we expect to see by chance?". False discovery rate (Q) was controlled by randomising the datasets 10,000 times, repeating the analyses with these randomised data, and then carrying out a linear regression on the simulated results to estimate the pvalue cut-off at which Q<0.05.
The meta-analysis was repeated three times, using only the datasets from specific tissues. Thus, four analyses were carried out, a global analysis of all tissues (127 datasets) and tissue-specific analyses of brain (29 datasets), heart (9 datasets) and muscle (26 datasets).
A summary of the method is given in Supplementary  Figure 1.

Determination of tissue specificity
The expression data from version 7 of the GTEx project [39] was downloaded and used to calculate a τ index for each gene. The τ index is an indicator of how specifically or broadly expressed a gene is, with a τ of 1 indicating expression specific to only one tissue, and a τ of 0 indicating equal expression across all tissues [40]. The τ index for a given gene can be calculated as shown in Equation 2, where N is the number of tissues being studied and xi is the expression profile component for a given tissue, normalised by the maximal component value for that gene (i.e. the expression of that gene in the tissue it is most highly expressed in).

Comparison with relevant ageing gene lists
The overlap between the global signature and relevant ageing gene lists was tested using the hypergeometric test [41] with all the genes included in the metaanalysis as the background set. When comparing to the GenAge expression signature, over-and underexpressed genes were considered separately.

Tissue specificity of ageing genes
The association between differential expression with ageing according to the meta-analysis and tissue specificity (defined as a having τ index of >0.8 based on the GTEx data) was tested using a chi-squared test and the phi coefficient was calculated to indicate the strength of the correlation. Association was tested for both over-and underexpressed genes, for all four metaanalyses (Bonferroni corrected).

Enrichment analysis
The topGO package (v2.28.0) [42] was used in the R programming environment using the weight01 algorithm [43] and Fisher's exact test to calculate enrichment of GO terms. Genes were mapped to the GO-2017-03-29 release since this is the release utilised by the GO.db package version in Bioconductor 3.5 [44].

Rule-based precision analysis
To complement the enrichment analysis, Random Forest (RF) machine learning models were used to identify the most important GO terms for the classification of genes as overor underexpressed with age. The RF algorithm builds many Random Trees (RT) during its training (model construction) phase. Each node in a RT contains a condition that splits the instances (the genes) into two subsets according to the values of the selected feature (in our case, the presence or absence of a GO term in a gene), creating two child nodes. The RF algorithm aims to select features that best split genes (based on their change in expression label) into the two groups, so that genes of different class labels (over vs. under-expressed) are assigned as much as possible to different groups. Next, the algorithm re-runs the previously described split procedure in the two newly generated groups until some user-defined condition is met.
To predict the class label of an unseen gene, for every RT, the conditions in the tree (starting in the root node) are matched against the gene's features (GO terms from GO-2017-03-14) until a leaf node is reached. When the instance (gene) reaches a leaf node, the most frequent class in the node is selected to be the prediction of the tree. The final prediction of the whole RF model is defined by the simple voting of all RTs.
We used Rule-Based Precision (RBP) [45] to measure the importance of features used by the model. Briefly speaking, to measure the RBP we build several RFs, where each of them in turn comprises many RTs. For each tree and feature (a GO term), we identify all paths in the decision trees from root to leaf that use the positive value of the GO term feature, that is, paths in the tree that "capture" a gene only if the GO term annotates that gene. Then, the method calculates the AGING overall precision of these paths, and uses this precision to rank the GO terms regarding predictive power. The main motivation for using the RBP measure is that it was designed specifically to reward "positive" feature values (GO term annotations), rather than "negative" feature values (lack of GO term annotations), since the former are more reliable. Actually, a negative feature value denotes lack of evidence, rather than evidence for the absence of a given gene function.

Network analysis
The human PPI network was downloaded from BioGRID version 3.3.123 [46] and non-physical interactions were removed, leaving 219,240 interactions. Additionally, an unweighted co-expression network of highly correlated genes from the GeneFriends RNA-seq co-expression map (V3.1) was also used [47]. The betweenness, closeness and degree (normalised by dividing by the maximum degree of a graph n-1, where n is the number of nodes in graph G) of each gene in these networks were calculated using the 'networkx' Python library [48], and the average betweenness, closeness and degree of the genes in each expression signature was determined. The centrality measures of over-and underexpressed genes were then compared to their opposite category, as well as the nondifferentially expressed genes by pairwise Mann-Whitney U tests (Bonferroni corrected).

dN/dS analysis
To identify any differences in the evolutionary conservation of genes differentially expressed with age, the dN/dS ratios for comparison between humans and mice, and humans and rats were obtained from Ensembl Biomart release 96, keeping only those genes with 1 to 1 ortholog homology type between the relevant species and high orthology confidence. These dN/dS ratios compare the rates of synonymous and nonsynonymous substitutions between species for a given gene, giving an idea of the type of selection that gene may be under, if any [49]. The distribution of dN/dS scores was compared by pairwise Mann-Whitney U tests (Bonferroni corrected) across all comparisons between genes overexpressed with age, underexpressed with age and unchanged with age.

Data availability
The data that supports the findings of this study are available in the supplementary material of this article, which are available on the Integrative Genomics of Ageing Group AgeingSignatures2020_supplementary GitHub repository (https://github.com/maglab/Ageing Signatures2020_supplementary). These data were derived from the resources listed in Supplementary  Table 1.

AUTHOR CONTRIBUTIONS
DP contributed to the meta-analysis, conducted bioinformatics analyses and wrote the manuscript. FF conducted data mining analyses. AD conducted the meta-analysis. JPM and AAF conceived the project. All authors read, critically revised and approved the final manuscript.