Dataset generated in a systematic review and meta-analysis of biological clocks as age estimation markers in animal ecology

The dataset comprises a comprehensive systematic review and meta-analysis exploring the utility of biological clocks as age estimation markers in the context of animal ecology. The systematic review adhered to PRISMA guidelines and employed optimized Boolean search strings to retrieve relevant studies from Scopus and Dimensions databases. A total of 78 methylation studies and 108 telomere studies were included after rigorous screening. Effect sizes were computed, and statistical transformations were applied when necessary, ensuring compatibility for meta-analysis. Data from these studies were meticulously collected, encompassing statistical measures, study attributes, and additional biological information. The dataset comprises several folders, carefully organized to facilitate access and understanding. It contains raw and processed data used in the systematic review and meta-analysis, including Boolean search strings, database search results, citation network analysis data, PRISMA statements, extracted study data, and input data for meta-analysis. Each folder's contents are described in detail, ensuring clarity and reusability. This dataset aggregates primary research studies spanning diverse ecosystems and taxa, providing a valuable resource for researchers, biodiversity managers and policymakers. This dataset offers a wealth of information and analysis potential for researchers studying age estimation markers in animal ecology, serving as a robust foundation for future investigations and reviews in this evolving field.

spanning diverse ecosystems and taxa, providing a valuable resource for researchers, biodiversity managers and policymakers.This dataset offers a wealth of information and analysis potential for researchers studying age estimation markers in animal ecology, serving as a robust foundation for future investigations and reviews in this evolving field. ©

Value of the Data
• These datasets represent the most up-to-date collection of studies using two biological clocks based on epigenetics, methylation and telomere length, to determine age in animals and defines the state-of-the-art of the field using a systematic method that enables both transparency and reproducibility.• This data helps gain novel insights into key attributes of study design such as method used, sample sizes, and tissue type and may guide the design of future studies.• The side-by-side comparison of the overall utility of either method for age estimation helps provide clarity on the effectiveness of either method or may assist scientists when choosing a method of studying ageing in natural populations.• While telomere length has been in use for two decades, methylation as a biomarker for age is a relatively new and rapidly evolving field, which may likely require an updated review on the topic in a few years; in which case this dataset serves as a good baseline for subsequent reviews.
• Authorship of included studies for both methylation and telomeres was used to create and benchmark a new measure of bias for meta-analysis, Author Bias, calculated by a custom PYTHON script called ABCal to quantitate the effects of highly represented authors.• Study attributes, such as year of publication and study location, were also used to benchmark scientometric plotting functionality using the same tool.

Data Description
The full dataset consists of several folders and subfolders containing the raw and/or processed data used in the systematic review and meta-analysis.The first folder is named "literature search" and contains three subfolders.The first contains two text files with the Boolean search strings, used to search databases, while the second and third contains the raw results from the search of the Scopus and Dimensions databases as comma separated value files.The column headings for search result files are described in Tables 1 and 2 .The second folder is named "Citation Network Analysis" and contains two subfolders, one for raw data and one for results.The data folder contains two files for the methylation studies and two for the telomere studies, of which the comma separated value file contains the studies selected for inclusion after screening, in the same format as Scopus output, while the text file contains the same data transformed to match the input requirements for citation network analysis.The column headings for the transformed data are described in Table 3 .The results subfolder contains the images for the citation network analysis.The third folder named "PRISMA Statements" contains the two PRISMA statements for methylation and telomere studies respectively.The fourth folder is named "Extracted Data" and contains two subfolders.The first subfolder is titled "Extracted Data -Studies", and contains a spreadsheet workbook with three tabs, one for methylation studies and two for telomere studies; column headings and associated data is described in Table 4 .The second subfolder is titled "Extracted Data -Other" and contains data that was extracted from individual studies where relevant statistics could not be retrieved from the full text.The fifth folder is titled "Meta-Analysis" and contains that final dataset in comma separated value format that was used to perform the meta-analysis.The column headings and as-  Computed variance of the effect size sociated data is described in Table 5 .The same folder also contains the related forest plots that were generated as output.The sixth and final folder is named "Code" and contains the R code used to perform the meta-analysis.This file is in the standard R format (.R) and contains relevant labels and annotations to explain which steps were performed by specific lines of code.

Literature search and study screening for systematic review
Literature was searched and screened using systematic review methods ( Figs. 1 and 2 ) per the preferred reporting items for systematic reviews and meta-analysis (PRISMA) statement [ 2 , 3 ], in line with PRISMA Ecology and Evolution guidelines [4] and Cochrane best practices [5] .Literature was searched between September of 2022 and June of 2023 on two databases: Scopus ( www.scopus.com ) and Dimensions ( www.dimensions.ai).Databases were searched using an optimized Boolean search string derived from the PICO terms for the aim and objectives of the review.For methylation studies the search string was: ("Epigenetics" OR "Methylation") AND ("age" OR "aging") AND ("determination" OR "model") AND ("Animals" OR "wild").For telomere studies the search string was: ("Telomeres") AND ("age" OR "aging") AND ("shortening" OR "lengthening") AND ("Animals" OR "wild").Initial results were subjected to further automated screening using additional search terms as constraints to reduce the results for specificity and to exclude results from human studies.For the Scopus and Dimensions database searches, the final set of results for screening were exported in the comma separated value (CSV) format.Sources identified were imported (citation and abstract) into Mendeley citation manager ( www.mendeley.com) for manual screening.The final set of studies that passed preliminary screening were sought during full text retrieval and added to the imported references if it wasn't already included.A total of 78 studies were included in the final review for methylation and 108 for telomeres.Further analyses of included studies were done through citation network analyses.For the Scopus database, the results were merged and reformatted with the R package Scopus2CitNet 0.1.0.0 in RStudio 1.4.1106[6] , running R 4.0.5 [7] .The final included studies from the results of Scopus and Dimensions were subsequently visualized by year in Cit-NetExplorer 1.0.0.and by group in VOSviewer 1.6.16[8] keeping only those papers that overlapped in terms of references cited (bibliometric coupling) for the largest connected components ( Figs. 3 and 4 ).PRISMA statement for the systematic approach used to identify studies that measured methylation in relation to age to develop methylation as a biomarker for age in animals.Two databases were searched using the indicated Boolean search strings.Initial automated screening removed duplicates and used additional key words to filter the results.Potential studies from the cleaned dataset were sought for retrieval and assessed for eligibility in Mendeley.Additional studies were identified from citation searches.Details are provided for relevant exclusion criteria used at each step.The final set of included studies were analysed by citation network analyses to facilitate the synthesis of the literature.Further details are also provided for the retrieval of model details and summary statistics from individual studies for inclusion in meta-analysis.(image edited in BioRender.com).

Data collection and processing for meta-analysis
Data was collected from studies that reported models using methylation or telomeres as biomarker to infer the age of animals.Relevant statistics and study attributes were retrieved from tables, figures, or the main text.For methylation studies this included key reported statistics such as the correlation coefficient and p-values for models.For telomere studies, several different statistical tests were applied beyond linear models.As such, the reported statistics compiled included measures from tests reported as correlation coefficients and p-values as well as t-values, F-values , and z-values .Where clear statistical measures were not available, WebPlotDigitizer 4.6 [9] was used to extract data from graphs or plots to repeat the reported tests and derive the relevant statistics.Data was extracted from figures from one methylation study, for snow leopards [10] , and two telomere studies, for baboons [11] and chimpanzees [12] , for which the correlation coefficients were computed using linear regression.For telomere studies, data was also retrieved from the online supplements to compute relevant statistics for rainbow trout [13] and stonechat species [14] , for which the F-statistics were compute using ANOVA.Key study characteristics such as species, sample size, tissue type, and empirical method were also collected.Additional attributes such as lifespan, karyotype, and genome size were retrieved from online databases [15][16][17][18] .Fig. 2. PRISMA statement for the systematic approach used to identify studies that measured changes in telomere length in relation to age to develop telomere length as a biomarker for age in animals.Two databases were searched using the indicated Boolean search strings.Initial automated screening removed duplicates and used additional key words to filter the results.Potential studies from the cleaned dataset were sought for retrieval and assessed for eligibility in Mendeley.Additional studies were identified from citation searches.Details are provided for relevant exclusion criteria used at each step.The final set of included studies were analysed by citation network analyses to facilitate the synthesis of the literature.Further details are also provided for the retrieval of model details and summary statistics from individual studies for inclusion in meta-analysis.(Image edited in BioRender.com).
The effect sizes of the treatment effect (TE), expressed as Fisher's-Z , as well as the variance thereof expressed as standard error (SETE) were computed with the R package compute effect size 0.2-2 [19] based on equations derived from "The Handbook of Research Synthesis and Meta-Analysis" [20] .Where the correlation (r) was available the z-transformed effect sizes, Fisher's-Z , were calculated as per equation 1.
where the correlation (r) was not available, the Fisher's-Z was derived from reported statistical measures by first converting between the given measure and the correlation.For Chi-squared ( χ 2 ) , equation 2 was used, where the correlation is derived from the quotient of the square root of Chi-squared and sample size (n) .
For F-test statistics (f) , equation 3 was used to first calculate Cohen's d (d) by taking the square root of the f-value multiplied by the sum of group sizes divided by the product of group sizes.From d , the correlation was derived using equation 4, where d is divided by the square root of the sum of (i) the square of d and (ii) the quotient of the sum of squared group sizes  and the product of group sizes.
For t-test (t) statistics, equation 5 was used to calculate the correlation by taking the square root of the quotient between (i) the square of the t-value and (ii) the sum of t-squared added to sample size (n) , subtracting two.
where only the probability (p-value) was available, the correlation was derived from first calculating Cohen's d using the quantile function of the Student's t-distribution (where q represents the quantile function, α represent the desired significance level from the p-value , and df represents the degrees of freedom from the sample size) and multiplying by the square root of the quotient for the sum of group sizes and the product of group sizes (equation 6).Hereafter the correlation could be calculated as per equation 4.
The variance associated with individual correlations, var r , was calculated using equation 7 by dividing (i) the square of r-squared subtracted from one by (ii) the sample size (n) minus one.
This data was incorporated into a systematic review and meta-analysis as previously published [1] .

Limitations
Tests for funnel plot asymmetry as a measure of potential publication bias were statistically significant (p-value < 0.05), indicating the potential absence of studies from the primary literature.This is likely due to small study effects, where studies with smaller sample sizes are likely excluded from publication through peer-review.The low levels of studies reported from the global South and Africa may also indicate that similar studies from these regions are missing from the primary literature, possibly due to a lack of priority or suitable resources.The trim-andfill method was used to infer possibly missing studies and, while this did not substantially alter the overall interpretations of the results, 18 studies were inferred and added as missing from the methylation dataset while 49 studies were inferred and added as missing from the telomere studies.Additionally, it should be noted that the methylation dataset contained an abundance of studies in mammals, with fewer studies from other vertebrate classes, while the telomere studies included an abundance of studies in birds.There was, however, little evidence that vertebrate class had a significant effect on the measured attributes.Lastly, differences in timespan for publications may impact interpretations as reported effect sizes are known to show temporal variation; typically studies that show large effect sizes are published first, establishing the validity of a method or introducing the field, however, later studies-with smaller effect sizesare often published a decade later.Given that methylation studies have only been published over half of the same period than for telomere studies, future studies may still be published for methylation showing lower effect sizes.

Ethics Statement
Data used in our review represent secondary data from published literature and online resources.Included studies complied with the ARRIVE guidelines and were carried out in accordance with the United Kingdom (UK) Animals (Scientific Procedures) Act, 1986 and associated guidelines; European Union Directive 2010/63/EU for animal experiments; the National Institute of Health (UK) guide for the care and use of laboratory animals (NIH Publications No. 8023, revised 1978); or other ethical guidelines as per the country of origin.Ethics approvals for the present study were obtained from the University of the Free State (approval number: UFS-AED2020/0015/1709) as well as the South African National Biodiversity Institute (approval number: SANBI/RES/P2020/30).

Fig. 1 .
Fig.1.PRISMA statement for the systematic approach used to identify studies that measured methylation in relation to age to develop methylation as a biomarker for age in animals.Two databases were searched using the indicated Boolean search strings.Initial automated screening removed duplicates and used additional key words to filter the results.Potential studies from the cleaned dataset were sought for retrieval and assessed for eligibility in Mendeley.Additional studies were identified from citation searches.Details are provided for relevant exclusion criteria used at each step.The final set of included studies were analysed by citation network analyses to facilitate the synthesis of the literature.Further details are also provided for the retrieval of model details and summary statistics from individual studies for inclusion in meta-analysis.(image edited in BioRender.com).

Fig. 3 .
Fig. 3. Visualised citation network for methylation studies identified in database literature searches, visualised in VOSviewer in CitNetExplorer.The top panel indicates clustering analyses performed in VOSviewer, which identified four key groups, labelled 1 through 4. The bubbles indicate key authors labelled by surnames and initials with bubble size corresponding to the number of citation links with other authors.The bottom panel indicates citation network analyses of publications in CitNetExplorer, which are organized by year (2014-2023) with the name and first initial of the first author indicating individual studies.The relationship between studies by virtue of co-citations in the reference lists are indicated by grey lines.Subgroup analyses identified four key clusters, indicated according to the group colours from VOSviewer.(Image edited in BioRender.com).

Fig. 4 .
Fig. 4. Visualised citation network for telomere studies identified in database literature searches, visualised in VOSviewer in CitNetExplorer.The top panel indicates clustering analyses performed in VOSviewer, which identified nine key groups, labelled 1 through 9.The bubbles indicate key authors labelled by surnames and initials with bubble size corresponding to the number of citation links with other authors.The bottom panel indicates citation network analyses of publications in CitNetExplorer, which are organized by year (2002-2023) with the name and first initial of the first author indicating individual studies.The relationship between studies by virtue of co-citations in the reference lists are indicated by grey lines.Subgroup analyses identified several key clusters, indicated according to the group colours from VOSviewer.(Image edited in BioRender.com).
2023The Author(s).Published by Elsevier Inc. Boolean search string was used to search two scientific databases.Initial results were narrowed down by automated filtering techniques such as additional search terms as inclusion and exclusion criteria.Results were screened manually by assessing the titles, abstracts, and key words for relevance.Additional studies were identified from ancillary "free term" searches.The final set of preliminary studies for inclusion were sought for full text retrieval.Relationships between included studies were explored by citation network analysis of bibliographic coupling.Data collated from studies included species, sample size, and statistical measures used to calculate effect sizes and variances.Data source locationData included in the meta-analysis and review were collated from scientific literature (Scopus and Dimensions databases) and represents a globally distributed dataset from 2001 to 2023, with most studies originating from North America, Europe, and Australia.A smaller number of studies included sampling from South America and Africa.

Table 1
Field names and data description for search results retrieved from Scopus.
Link URL to online page for article at publisher References Full list of references cited in the article Document Type Type of publication e.g., article, review, book etc. Publication Stage Status of publication e.g., final or in press Open Access Open access status e.g., green or gold Source Database used as source to retrieve document details e.g., Scopus EID Electronic Identifier (EID), usually the last part of DOI

Table 2
Field names and data description for search results retrieved from dimensions.

Table 3
Field names and data description for Scopus results after transformation for citation network analysis.

Table 4
Field names and data description for file containing extracted study data from the review.

Table 5
Field names and data description for file containing extracted data prepared for meta-analysis.bias from authorship as Low, Medium, or High Cal.Bias Raw value for the normalised calculated author bias values * Group 1: single species for model, validated in same species; Group 2: single species for model, validated in different/related species; Group 3: multiple species for model, validated in multiple species.