Early life adiposity and telomere length across the life course: a systematic review and meta-analysis

Background: The relationship between adiposity at birth and in childhood, and telomere length is yet to be determined. We aimed to systematically review and meta-analyse the results of studies assessing associations between neonatal and later childhood adiposity, and telomere length. Methods: We searched Medline, EMBASE and PubMed for studies reporting associations between adiposity measured in the neonatal period or later childhood/adolescence, and leucocyte telomere length, measured at any age via quantitative polymerase chain reaction, or terminal restriction fragment analysis, either cross-sectionally, or longitudinally. Papers published before April 2017 were included. Results: Out of 230 abstracts assessed, 23 papers (32 estimates) were retained, from which 19 estimates were meta-analysed (15 cross-sectional, four longitudinal). Of the 15 cross-sectional estimates, seven reported on neonates: four used binary exposures of small-for-gestational-age vs. appropriate-for-gestational age (or appropriate- and large-for-gestational age), and three studied birth weight continuously. Eight estimates reported on later childhood or adolescent measures; five estimates were from studies of binary exposures (overweight/obese vs. non-obese children), and three studies used continuous measures of body mass index. All four longitudinal estimates were of neonatal adiposity, with two estimates for small-for-gestational-age vs. appropriate-for-gestational age neonates, and two estimates of birth weight studied continuously, in relation to adult telomere (49-61 years). There was no strong evidence of an association between neonatal or later childhood/adolescent adiposity, and telomere length. However, between study heterogeneity was high, and there were few combinable studies. Conclusions: Our systematic review and meta-analysis found no strong evidence of an association between neonatal or later childhood or adolescent adiposity and telomere length.


Introduction
Telomeres are regions of repetitive (TTAGGG) n sequences situated at the ends of chromosomes. They buffer against loss of coding DNA (the 'end replication problem'), and there is evidence that telomere length is associated with chronological age 1 and longitudinally with diseases of later life, such as cardiovascular disease 2,3 and cancer 3,4 .
In addition to disease states, an association has been observed between unhealthy lifestyle factors and a reduction in telomere length 5 . This has led to the suggestion that telomere length may lie on the causal pathway between traditional risk factors and chronic disease 6 . One such studied risk factor is adiposity; there is evidence that greater adiposity in adults is associated with shorter telomere length, in both cross-sectional and longitudinal studies 7,8 . Given that obesity may result in chronic levels of inflammation and oxidative stress 9 , and that telomeric DNA is vulnerable to damage by oxidative stress 10 , it is plausible that obesity may promote telomere attrition 7 .
Findings from existing studies that have assessed the association between obesity and leucocyte telomere length in children are conflicting, with studies reporting positive 11 , negative 12 and null 13-19 findings. Two systematic reviews of adiposity and telomere length that primarily focused on adiposity measured in adults have also briefly reported on evidence from studies of adiposity in childhood: Mundstock et al. 8 systematically reviewed and metaanalysed the results of three cross-sectional studies 12-14 of the association between childhood obesity and telomere length 8 . This review reported greater childhood adiposity to be associated with shorter telomere length. Müezzinler et al. 7 retrieved three studies assessing the association between body mass index (BMI) and telomere length in children, but concluded that none of the studies were suitable for meta-analysis 7 . Additional studies have been published since these reviews. Furthermore, neither study assessed the association of adiposity at birth (as opposed to in later childhood) with telomere length. This is of interest for two reasons: firstly, in utero adversity is a predictor of later chronic diseases 20 , for which telomere length may be a risk factor 2,3 , and secondly, telomere length is a marker of numerous adverse conditions across the life course, yet few studies have examined markers of prenatal adversity (a time of active cell replication) in relation to telomere length 21 . Identifying associations in children (as opposed to adults) may also provide useful information about the ages at which associations between adiposity and telomere length emerge, and whether or not the direction and magnitude of the association between adiposity and telomere length is consistent through infancy, later childhood/adolescence and adulthood.
Here, we report the results of a systematic review and meta-analysis of both cross-sectional and longitudinal studies from the general population (i.e. in non-clinical populations) that have assessed the relationship between measures of neonatal and/or adiposity in older children and telomere length.

Inclusion criteria
Eligible studies included those with at least one measure of adiposity in the neonatal period or later childhood/adolescence (hereafter used interchangeably with 'childhood', defined as after the neonatal period [0-28 days], with mean age <19 years). Any measure of adiposity was considered, including (but not restricted to) BMI, weight, waist circumference, waist-to-hip ratio, waistto-height ratio, skinfold thickness, fat mass, ponderal index, and birth weight. The outcome considered was leucocyte telomere length measured in peripheral venous or cord blood, by either quantitative polymerase chain reaction (qPCR) or terminal restriction fragment analysis (TRF). Leucocyte telomere length is commonly considered as a proxy for 'whole-body' ageing and biochemical stress, as well as being a risk factor for disease in its own right 22 . We considered both cross-sectional studies in which adiposity and telomere length were measured concurrently and longitudinal studies in which adiposity was measured in the neonatal period/childhood and telomere length was measured after a follow-up period, i.e. in either childhood or adulthood.
Studies were included even if adiposity measures were not the primary exposure (for example, studies in which adiposity measures were measured as covariates) provided that a relationship between adiposity and telomere length was assessed. Papers were only included if adiposity exposures were adjusted for age and sex, or if effect estimates were adjusted for (or

Amendments from Version 1
In this new version of our article, we have responded to reviewer comments so as to clarify some aspects of our study. The major changes are as follows: • In the Methods section, we have now stated that we undertook separate meta-analyses of continuous and binary exposures, and of cross-sectional and longitudinal studies. We have also added that leucocyte telomere length is commonly used as a proxy for whole-body ageing. We state that although we understand that small-, appropriate-and large-for-gestational age are not measures of adiposity per se, they have been previously shown to be proxies for adiposity. We have also clarified our use of the term 'childhood', which we use to mean the period after the neonatal period but before adulthood (we originally used 'childhood' throughout for brevity, but added this clarification as a reviewer correctly pointed out that this term usually encompasses the neonatal period). Appropriate edits have been made to the Abstract and Introduction to account for this.
• In the Results section, we have clarified the legends of the forest plot figures to enhance their readability. We have also added the numbers of individuals in the lean and adipose groups of studies using binary exposures to Table 1 and Table 2.
• Finally, in our Discussion, we have added a paragraph on the importance of perinatal complications and how these might be considered in future analyses. We acknowledge further possible sources of heterogeneity and residual confounding, and discuss how these might have affected our results. We also discuss the possible consequences of combining studies that measured telomere length at different ages.

REVISED
stratified by) age and sex. These criteria were relaxed if the estimate was based on a sample in which participants' ages varied by a range of no more than three years, if exposure groups were matched by age or sex, or if it was shown that age or sex was not associated with telomere length in the population of interest.

Exclusion criteria
Studies examining the effect of an intervention were not included, unless a pre-intervention, cross-sectional estimate of the relationship was provided. Furthermore, studies were excluded if participants were selected into the study on the basis of comorbidities (e.g. sleep apnoea, maternal stress, prematurity). Articles were also excluded if no full text was available from the British Library.

Search strategy
Medline and EMBASE were searched using the Ovid platform.
PubMed was also searched. Searches were run until April, 2017. Search terms included thesaurus terms (MeSH/Emtree) for 'telomere length', 'adiposity', 'obesity', 'weight' and 'birth weight'. In addition, thesaurus terms for infants and children were used. Appropriate synonyms were identified for all terms above and entered into the search as keyword searches in the title and abstract. The search strategy is detailed in Supplementary File 1.
Studies were considered eligible for screening regardless of language, provided that a translator could be sourced within the department where the review was performed. Reference lists of pertinent papers were searched in order to identify additional studies that may have been missed by the search strategy.
Only peer-reviewed sources of evidence (journal articles, doctoral theses) were included. If there was evidence of dual publication of a study population, the largest population was used (provided that this was available in full-text form). Conference abstracts were not included, but relevant abstracts were cross-referenced against the search results to ensure that any follow-up peerreviewed sources resulting from the same data were included.

Study screening and selection
One reviewer (AG) screened all titles and abstracts and excluded those that were clearly ineligible according to the criteria above. Decisions on remaining titles were made after discussion between two researchers (AG and ELA). Data were extracted from relevant full-text articles by two researchers (AG and ELA), using a standardised extraction form. Study authors were contacted to clarify ambiguous results. Any disagreement between the two researchers performing data extraction was resolved by discussion. Supplementary Figure 1 -Supplementary Figure 2 show flowcharts detailing the review and extraction process.

Statistical analyses
To facilitate the pooling of results according to different transformations of both exposures and the outcomes (e.g. normalisation, z-scoring, log-transformation), all estimates were standardised for the meta-analyses. Plot digitiser software [http://arohatgi. info/WebPlotDigitizer] was used to extract data from studies presenting differences in means in the form of bar charts. For studies presenting estimates of average telomere length by adiposity exposure groups (for example, in small-for-gestational-age neonates compared to normal-and large-for-gestational-age neonates), effect sizes were expressed as the difference in telomere length (in SD units) between the two groups. For studies that analysed adiposity and telomere length as continuous variables, effect sizes were expressed as change in telomere length (in SD units) per 1-SD unit increase in the exposure variable. Formulae used for calculating standardised effect estimates and their standard errors are provided in Supplementary File 2.
Estimates and standard errors were meta-analysed in Stata MP Version 13 (StataCorp, TX) with the 'metan' command, using random-effects models. In addition to combining estimates of adiposity at different ages in separate meta-analyses, we also conducted different analyses for binary and continuous exposures. Moreover, cross-sectional and longitudinal studies were also meta-analysed separately, since longitudinal studies may provide information on whether an association between telomere length and birth weight tracks across the life course. Heterogeneity was estimated using the I 2 statistic, which represents the percentage of the total observed variability that is due to true differences in effect estimates between studies rather than chance variation 23 . Harmonisation of data in preparation for meta-analysis was performed in R (see script in Supplementary File 3). The Stata '.do' file for the meta-analysis is available in Supplementary File 4.  Table 1-Supplementary Table 4 give details of all studies assessed, and the reasons for which they were excluded. All titles that passed screening were English language papers. A total of 23 relevant studies (32 estimates) were identified after full-text screening.

Summary of retrieved studies
Estimates not included in meta-analysis. Thirteen estimates were not meta-analysed, either because they reported no estimate, or because the study design was not combinable with any other extracted estimate. The characteristics of these studies, along with the 13 reported effect estimates, are given in Table 1.
Seven of the 13 estimates not included in meta-analysis were of childhood adiposity exposures (waist circumference 16 (mean age at telomere measurement in longitudinal studies: 22 26 and 31 years 11 ). Generally, point estimates were negative, but confidence intervals were consistent with no association between measures of childhood adiposity and telomere length. One study reported a weak positive association between BMI at approximately 5 years and telomere length at 31 years, but only in women 11 .
Six of the 13 estimates not included in the meta-analysis studied neonatal adiposity, either as continuous ponderal index 21 , or as continuous 21,27 , or categorical birth weight 25,28 . Of the six estimates, three were from twin studies 21,28 . One estimate was cross sectional 27 , and five measured telomere length after a degree of follow-up (age range at follow-up: 5-80 years) 21,25,28 . In both cross-sectional and longitudinal studies, there was no discernible pattern of associations between neonatal adiposity and telomere length.

Estimates included in meta-analysis.
The 19 estimates (from 19 studies) that were retained for meta-analysis are described in Table 2. Of these, 15 were cross-sectional and 4 were longitudinal. Of the 15 cross-sectional estimates, 7 reported on neonatal adiposity: 4 used binary exposures of small-vs. appropriatefor-gestational age (or appropriate-and large-for-gestational age) 29-32 , and 3 studied birth weight continuously 33-35 . Eight papers studied childhood adiposity (age range 2-17 years), of which 5 estimates were from studies of overweight/obese vs. non-obese children 12-15,25 , and 3 were studies of body mass index as a continuous measure [16][17][18] . Longitudinal studies assessed neonatal adiposity, and telomere length after a follow-up (range: ~23-69 years) 21,36-38 : two studied small-versus appropriatefor-gestational age neonates 36,37 , and two studied birth weight as a continuous exposure 21,38 .

Meta-analyses
Cross-sectional studies. Figure 1 shows associations of cross-sectional studies of neonatal and childhood adiposity and telomere length. There was no evidence from these meta-analyses that neonatal adiposity or childhood adiposity were associated with concurrently measured telomere length.
Longitudinal studies. All longitudinal studies included in the meta-analysis measured adiposity only in neonates (i.e. no studies measured adiposity in childhood), with telomere length measured as early as 23.8 (SD 0.7) years 37 and as late as 69 years 21 . Pooled estimates are shown in Figure 2. There was no evidence that continuously studied birth weight was associated with prospectively measured telomere length. There was very weak evidence that adults born appropriate-for-gestational age had longer telomeres than those born small-for-gestational age (SMD [95% CI]=0.08 [0.01-0.14]).
Heterogeneity. Heterogeneity in meta-analyses of non-continuous adiposity exposures was variable, but generally high (ranging from 0% to 90.3%). This suggests that as much as 90.3% of variation is due to true differences between studies and not due to chance. Heterogeneity was much lower in studies using continuous measures of adiposity (range: 0-26%).

Discussion
We undertook a systematic review and meta-analysis of adiposity measured before 19 years of age in relation to longitudinal or cross-sectional estimates of telomere length measured in blood. To our knowledge, this is the first meta-analysis of adiposity and telomere length to synthesise evidence from neonatal measures of adiposity in relation to cross-sectionally or prospectively measured telomere length. We also provide updated estimates of the association of later childhood adiposity with telomere length 7,8 . We found no strong evidence for an association between any adiposity measure of neonatal or childhood adiposity and telomere length. A weak association suggesting that adults born small-for gestational age had shorter telomeres later in life was based on the meta-analysis of only two studies.
Generally, more heterogeneity was observed among effect estimates from studies assessing categorical adiposity measures (e.g. obese vs non-obese, small-for-gestational age vs. appropriate/ large-for-gestational-age); I 2 estimates suggested that much of the between-study variation observed was due to true differences between studies and not due to chance. Conversely, very low heterogeneity was observed in the studies using continuous adiposity exposures. We were unable to formally assess possible sources of heterogeneity with meta-regression among studies using categorical adiposity measures, due to the small number of studies. However, heterogeneity is likely to be, at least in part, due to the differing thresholds used to define adiposity categories (e.g. percentiles of BMI), as well as other potential sources, such as differing ethnicities between studies, and the methods used to measure telomere length.
Mechanisms for the association of adiposity and telomere length It has been suggested that oxidative stress and inflammation are determinants of telomeric attrition, and it is proposed that as a source of oxidative stress 9 obesity may accelerate loss of telomeric DNA 39 . When considered as a non-causal biomarker of ageing, the shortening of telomere length as a result of inflammation and oxidative stress is known as the 'telomeric clock' model 40 . However, there is evidence that there is a complex 'axis of ageing' that exists between telomeres and mitochondrial function 41 : it has therefore been suggested that telomere attrition may impact mitochondrial activity, thus leading to metabolic dysregulation 42 . In animal models, such mitochondrial dysfunction may manifest as increased adiposity and insulin resistance 43 . In this latter case, the causal direction could be reversed, with telomere attrition as a risk factor for disease. However, a Mendelian randomisation analysis (in which genetic variants are used as non-confounded instrumental variables of disease risk factors 44 ), of telomere length in relation to BMI found no association in this direction 3 .
Aviv and colleagues challenge the telomere clock hypothesis by suggesting "that individuals who are born with relatively short telomeres tend to enter adulthood with short leucocyte telomere length" 40 . Moreover, this group have observed that the variation in neonatal telomere length is larger than the average amount of attrition that would be expected over a lifetime. This challenges the clock hypothesis, since, if true, individuals should begin life   Table 2). The names of the analyses in each panel correspond to those given in Table 2. P-values next to the I 2 -value in each meta-analysis correspond to the p-value for the Q-statistics from the test of heterogeneity.
with a 'clock time' of zero 40 . Therefore, an alternate hypothesis is that telomere length is largely pre-determined at birth 45 , and that variable rates of attrition in adulthood would not necessarily be enough to alter an individual's telomere length percentile ranking 40 .
Whilst this does not negate the possibility that oxidative stress later in life may still contribute to attrition, this group state that early determinants of telomere length may be more important 45 Under this assumption, combining estimates of neonatal adiposity  ) for studies comparing telomere length in those born appropriate-and small-for-gestational-age, and the change in telomere length (SD units) per 1-SD increase in birth weight. Meta-analysis is by random-effects, and 95% confidence intervals (CI) are shown, along with weights for each estimate. Box size is proportional to study weight, and black lines represent 95% CIs. Summary estimates for each panel are shown as diamonds. The scale is in standardised units (see Methods for more information). Abbreviations: SGA/AGA=small-/appropriate-and-orlarge-for-gestational age; TL=telomere length; SD=standard deviation; ES=effect size. The names of the analyses in each panel correspond to those given in Table 2. P-values next to the I 2 -value in each meta-analysis correspond to the p-value for the Q-statistics from the test of heterogeneity.
in relation to telomere length ascertained at different ages should not alter results appreciably, as each individual would be placed on a set trajectory, altered little by postnatal exposures. In this case, it could be postulated that neonatal adiposity would have a greater association with telomere length than postnatal adiposity measures (including childhood adiposity). However, our results do not provide evidence for this hypothesis, since we found no strong evidence for an association between either neonatal adiposity with telomere length.

Strengths and limitations
Although the relationship between adiposity and telomere length has been studied previously 7,8 , to our knowledge, this is the first study to systematically review and meta-analyse the evidence concerning neonatal adiposity measures and telomere length. However, there are a number of limitations to this work. Firstly, although we found 19 meta-analysable estimates, the differing study designs meant that estimates were only combinable in small groups, and 13 estimates were not combinable at all. Thus, power to detect associations within each individual category (most of which meta-analysed only 2-3 estimates in each) was limited. Where possible, we contacted authors to obtain the necessary information to standardise estimates, permitting them to be included in the meta-analysis. However, many of the source publications were written over 15 years ago, and the original data were not available. The meta-analysis may be subject to non-inclusion bias if the studies included in the meta-analyses are different to those not included. That said, we performed a narrative synthesis of those estimates which we were unable to include in the meta-analyses and conclusions were largely the same. The small number of studies retrieved, combined with their poor combinability, meant that meaningful inference from risk of bias assessments would not have been possible. Despite finding no strong evidence of non-inclusion bias, we acknowledge that publication bias remains a possibility, and this is therefore a limitation of our work. We were not able to make meaningful inferences about the likely presence of small-study effects using funnel plots, since there were so few combinable studies in each group 46 . Not only did studies vary in the measures of adiposity studied (i.e. low birth weight versus small-for-gestational age as measures of neonatal adiposity), and whether they were studied as continuous or binary exposures, but studies also varied by method used to assay telomere length, as well as the transformations performed on exposure and outcome variables, and the age of the children studied. Most studies performed only minimal adjustment for potential confounding variables (or only adjusted exposures), thus we cannot rule out unmeasured or residual confounding. The lack of adjustment for prenatal factors in most studies also makes it difficult to establish whether the associations observed are due to a foetal predisposition to larger or smaller body size, or in utero effects. For example, birth weight may act as a surrogate marker for many maternal sources of in utero adversity 47 , and it may be these mechanisms that are important in determining telomere length. A meta-analysis focussing specifically on these exposures would therefore be of value in the field. Although we did not find evidence of an effect in this study, a Mendelian randomization framework may prove useful for establishing whether there is a likely causal relationship between adiposity and telomere length. Although Haycock et al. (2017) found no evidence of association between telomere length (exposure) and BMI (outcome) 3 , the reverse direction (adiposity→telomere length, as assessed in this review) has not been studied. Utilising the twosample MR framework in order to assess adiposity as a causal determinant of telomere length would represent a highly powered method of assessing causality using summary-level genetic data.
We harmonised effect estimates into standardised units that would allow comparison of estimates obtained from both qPCR and TRF telomere lengths. However, whilst this allowed comparisons of telomere metrics measured on different scales, it does not address measurement error. Generally, Southern blot estimates (by TRF) may be longer than telomere length measured by qPCR due to inclusion of subtelomeric regions in the measure 48 . Furthermore, there is evidence that different assays have different sensitivity to measuring extremes of telomere lengths, and as such the relationships between the two measures may be non-linear 48,49 . Quantitative PCR measurements (which relate the relative fluorescence of a telomere amplicon to a single-gene reference 50 ) have their own limitations, being more prone to inter and intraassay variation than the gold standard measurement method of TRF analysis 48,51 . Whilst the majority of papers using qPCR reported coefficients of variation, suggesting an attempt to minimise batch effects had been made, the single-gene reference for qPCR assays varied between studies, which may have affected assay performance.

Conclusions
We found no strong evidence of a relationship between either neonatal or childhood measures of adiposity and concurrently or prospectively measured telomere length, but there were few combinable studies, and amongst published studies there was substantial heterogeneity in observed effects. Further work is needed to clarify whether neonatal and childhood adiposity is associated with telomere length.

Data availability
All data underlying the results are available as part of the article and supplementary material, and no additional source data are required.
Competing interests TRG reports funding from Sanofi, Biogen and GlaxoSmithKline for projects unrelated to the work presented in this manuscript. Authors have addressed most of the issues indicated in the review report and I believe that the revised version can now meet the journal publication requirements.

Grant information
In reference to item 4 (minor concerns) I have suggested that owning to the importance of confounders age and gender in telomere biology, description of adjustment for age and sex should have had a main role in methods section.
No competing interests were disclosed.

Competing Interests:
I have read this submission. I believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard. Authors aimed to systematically review and meta-analyze the results of studies assessing associations between early in life (neonatal / childhood) adiposity and telomere length The article is well written and contains important and original information. Below I leave some questions, besides those already carried out by the other reviewers, that I believe should be answered by the authors.
The search is over one year old. It may be interesting (not mandatory) to redo it. In addition the search was carried out in only 3 databases, although they are important bases, bases like the web the science and Scopus could be consulted.
The selection process by summary and title was performed by a single author, although this does not prevent the publication of the article, I suggest that in future reviews the authors follow the Cochrane recommendation "Assessment of eligibility of studies, and extraction of data from study reports, should be done by at least two people, independently". (Chapter 7: Selecting studies and collecting data) The text below would be better placed in the results, in the methodology you should describe how the data were treated.
"There was considerable heterogeneity in the presentation of findings across studies (Table 1 and Table  2). Some studies22-24 only presented differences in the form of bar charts " There was record of the protocol in some base (preferably in PROSPERO). If not, I strongly suggest that the protocol be recorded in future reviews.
A total of 427 papers that were published until April 22, 2017 were obtained after searching Medline, EMBASE and PubMed (Supplementary Figure 3). A total of 230 titles remained for assessment (What are the reasons for excluding these 197 articles duplicates? Not eligible and why? Etc).
In addition to the analysis of the risk of publication bias, already mentioned by the other reviewers, the authors did not perform the analysis of risk of bias within the studies.
The authors declare in the discussion "The small number of studies retrieved, combined with their poor combinability, meant that meaningful inference from risk of bias assessments would not but the analysis of the risk of bias within the studies is conducted in each study have been possible" individually so, the authors can use some tool for this analysis (or at least for quality analysis of each study). . 2011.

The Cochrane Collaboration
Reference Source

Are the conclusions drawn adequately supported by the results presented in the review? Yes
The manuscript references our article: Mundstock E, Sarria EE, Zatti H, et al.:

Referee Expertise: Exercise; Physical Activity
We have read this submission. We believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however we have significant reservations, as outlined above. Again, we would like to thank the reviewer for their encouragement and advice on our manuscript. Again, we would like to thank the reviewer for their encouragement and advice on our manuscript. We have responded to each of the queries in turn, hereafter.

Authors aimed to systematically review and meta-analyze the results of studies assessing associations between early in life (neonatal / childhood) adiposity and telomere length
The article is well written and contains important and original information. Below I leave some questions, besides those already carried out by the other reviewers, that I believe should be answered by the authors.
The search is over one year old. It may be interesting (not mandatory) to redo it. In addition the search was carried out in only 3 databases, although they are important bases, bases like the web the science and Scopus could be consulted.

The selection process by summary and title was performed by a single author, although this does not prevent the publication of the article, I suggest that in future reviews the authors follow the Cochrane recommendation "Assessment of eligibility of studies, and extraction of data from study reports, should be done by at least two people, independently". (Chapter 7: Selecting studies and collecting data)
Response: We thank the reviewer for their pragmatism. As is inevitable with any systematic review, there comes a point where it is not feasible to keep re-running the searches (and thus re-running multiple sets of analyses) and to publish the paper. We have therefore been clear about the date when the search was last conducted. There is considerable overlap between the searchable databases, and given the substantial number of duplicates we retrieved in our literature search, we do not believe that interrogating additional databases would be likely to add to the number of papers retrieved, especially as we hand-searched reference lists to capture additional studies for inclusion.
Given that this systematic review is not a Cochrane Review specifically, we deemed it appropriate that one author screen for any potentially eligible studies that might be included. The author was very inclusive at this initial stage, bringing forward any paper that appeared to be potentially eligible for inclusion. Moreover, as mentioned above, additional papers were picked up in the searches of all reference lists for included papers (which was conducted by two independent authors). We feel it is very likely that any papers missed by the author conducting the initial 'eligibility' screening would have been later picked up in the reference lists of included studies.
The text below would be better placed in the results, in the methodology you should describe how the data were treated. "There was considerable heterogeneity in the presentation of findings across studies (Table 1 and Table 2). Some studies22-24 only presented differences in the form of bar charts " Response: Whilst we appreciate that this sentence may initially read as Results and not Methods, we were explaining the reasons for standardising the estimates rather than commenting on empirical heterogeneity (which is in the Results section). For clarity, we have rewritten it as follows: Results: "To facilitate the pooling of results according to different transformations of both exposures and outcomes (e.g. normalisation, z-scoring, log-transformation), all estimates were 1

exposures and outcomes (e.g. normalisation, z-scoring, log-transformation), all estimates were standardised for the meta-analyses. Plot digitiser software [http://arohatgi.info/WebPlotDigitizer] was used to extract data from studies presenting differences in means in the form of bar charts."
There was record of the protocol in some base (preferably in PROSPERO). If not, I strongly suggest that the protocol be recorded in future reviews.

Response: We thank the reviewer for their suggestion.
A total of 427 papers that were published until April 22, 2017 were obtained after searching Medline, EMBASE and PubMed (Supplementary Figure 3). A total of 230 titles remained for assessment (What are the reasons for excluding these 197 articles duplicates? Not eligible and why? Etc).
Response: These 197 articles were excluded because on they were exact duplicates of some of the 230 articles that passed to screening (i.e. they were picked up by multiple databases).

In addition to the analysis of the risk of publication bias, already mentioned by the other reviewers, the authors did not perform the analysis of risk of bias within the studies.
The authors declare in the discussion "The small number of studies retrieved, combined with their poor combinability, meant that meaningful inference from risk of bias assessments would not have been possible" but the analysis of the risk of bias within the studies is conducted in each study individually so, the authors can use some tool for this analysis (or at least for quality analysis of each study).
Response: We thank the reviewer for their suggestion. We note that risk of bias within studies is extremely subjective. Tools for assessing quality in clinical trials are well-described but much less attention has been given to similar tools for observational epidemiological studies. Thus, formally examining bias within observational studies is not feasible, and there is no gold standard tool for its assessment. We refer the reviewer to the following systematic review of tools for assessing bias in observational studies (https://www.ncbi.nlm.nih.gov/pubmed/17470488), wherein the authors quote: "This review has highlighted the lack of a single obvious candidate tool for assessing quality of observational epidemiological studies. One might regard this review as the first stage towards development of a generic tool. In such an endeavour, one would need to reach a consensus on the critical domains that should be included. The development of the STROBE statement has involved extensive discussion among numerous experienced epidemiologists and statisticians. Despite targeting the reporting of studies, many items were no doubt selected due to presumed (or evidence of) association with susceptibility to bias. Thus the statement should provide a suitable starting point for development of a quality assessment tool, and we have been guided by it in our ."

presentation of results
No competing interests were disclosed. other authors had reported on evidence from studies of adiposity in childhood, but this is the first meta-analysis of adiposity and telomere length to synthesize evidence from neonatal measures.
This referee thinks that the issue is well prepared and updated. All tables and figures are clear and the references seem appropriate. Supporting material is also suitable enough; authors also presented the "Stata.do" file used for performing meta-analysis.
I believe that this issue will be a contribution to the readers of this journal and I would recommend the paper for indexing after some changes.
The main limitation I find is mentioned by the authors in the Discussion section. Studies on the association between the telomere length and adiposity in the neonatal period are not scarce (n= 11), however the estimates were only combinable in small groups (n= 2, n= 3, n= 4). Studies vary not only in the measures of adiposity studied but also whether they were studied as continuous or binary exposures. Each separate meta-analysis includes small number of studies, and the power of the test in such circumstances is low.
Other major considerations should be taken into account. I would suggest a restructuring of the manuscript (mainly from the Methods section), and perhaps also changing the manuscript Title. Some included studies were aimed to assess the difference in telomere length between small-for-gestational-age and appropriate-for-gestational age (or appropriate-and large-for-gestational age). Even a dataset was extracted from a manuscript on faetal growth retardation (Davy 2009, Table 2): It has been previously described that maternal complications (and even the low or high birth weight) would be related to complications in adult life. Moreover, the authors in the Discussion section mention that early determinants of telomere length may be important. I would suggest that a meta-analysis based on data on perinatal complications be considered separately, mainly because the underling mechanisms could be different. According, authors should dedicate a paragraph in the Discussion section to expand this topic.
Regarding the meta-analysis "AGA vs. SGA (Difference in TL [SD])", it should be also noted that even if effect sizes were expressed as the difference in telomere length between the two groups, those estimates are not measures of "adiposity".
Authors included both cross-sectional studies (in which adiposity and telomere length were measured concurrently) and longitudinal studies (in which adiposity was measured in the neonatal period/childhood and telomere length was measured later in childhood or adulthood): Although the authors performed two separate meta-analyzes, it would be convenient to explain in Methods section that two analyzes with different specific objectives will be carried out, because I 1 2 1 2 2. 3.

8.
Although the authors performed two separate meta-analyzes, it would be convenient to explain in Methods section that two analyzes with different specific objectives will be carried out, because I believe that the interpretation of the results of one or another study may not be the same. For longitudinal studies, the authors do not established an age of telomere length measurement. Therefore, it would be interesting for the reader to have a brief discussion about the aim of this specific study. I believe that the manuscript would be more valuable if data on "shortening" of telomere length is analyzed. Is it possible to add this analysis with the data provided in the manuscripts? Is it possible to request this information from the authors of the longitudinal studies?
To facilitate the pooling of results, all estimates were standardized: Perhaps in order to diminish a possible heterogeneity on datasets, the raw data could have been requested to original authors. Besides, this could have increased the number of included studies.
Potential "Publication bias" assessment is missing in the manuscript.
Minor concerns: In the Discussion section, the authors mention that heterogeneity is observed in the meta-analysis of categorical variables, and that this could be due to the cut-off values to define groups. However, at this point the authors should also mention other potential causes of heterogeneity such as the telomere length detection method, ethnicity, etc. Moreover, most studies performed adjustment for potential confounding variables age and sex, but we cannot exclude unmeasured or residual confounding. It is also possible that perinatal factors were involved. For example, data on Entringer 2013 manuscript was corrected for "Obstetric complications, preg. Specific stress" (Table 2), and although in this dataset the estimate was corrected, we cannot assume that other studies were free of perinatal complications.
In the Methods section, it is not clearly explained that different meta-analysis will be carried out. I would establish a priori that different studies will be done, and according to this, different types of data will be collected. For example, different measurements of "adiposity" were considered for incorporation into the meta-analysis, including weight and birth weight, but weight or birth weight are not measures of adiposity by themselves. Some studies presented estimates of average telomere length by adiposity exposure groups, for example, in obese vs. non obese, and such studies are eligible indeed.
Perhaps the authors should add a brief paragraph on the meaning of the measurement of telomere length in peripheral blood or cord, instead of the adipose tissue itself.
Obeying the fact that the telomere length varies by age and gender, the paragraph on adjustment for age and sex in Methods section, should be written aside.
Eligible studies included those with at least one measure of adiposity in the neonatal period or childhood (mean age <19 years). Childhood is the age span ranging from birth to adolescence. Perhaps the authors should reconsider the name of the group or the age range.
Please add in the Tables the n of the groups (binary exposures).
In Table 2, I was not able to find the meaning of the letters (A to F) in the Analysis column.
In Table 2, in the column "Sex Adjustment", in row 4 (De Zegher 2016) and also in row 16 (Shalev 8.
In Table 2, in the column "Sex Adjustment", in row 4 (De Zegher 2016) and also in row 16 (Shalev 2014) there is a question mark of uncertain significance.
In Figures 1 and 2 the statistics I2 and the p-values are shown. I assumed that the p-value corresponds to the analysis of heterogeneity, but it is actually not clear.
In Figures 1 and 2 it would be useful for the reader to find a brief explanation on how to read the Forest plot.
Are the rationale for, and objectives of, the Systematic Review clearly stated? Partly

Is the statistical analysis and its interpretation appropriate? Yes
Are the conclusions drawn adequately supported by the results presented in the review? Yes No competing interests were disclosed.

Competing Interests:
We have read this submission. We believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however we have significant reservations, as outlined above.
The main limitation I find is mentioned by the authors in the Discussion section. Studies on the association between the telomere length and adiposity in the neonatal period are not scarce (n= 11), however the estimates were only combinable in small groups (n= 2, n= 3, n= 4). Studies vary not only in the measures of adiposity studied but also whether they were studied as continuous or binary exposures. Each separate meta-analysis includes small number of studies, and the power of the test in such circumstances is low.
Other major considerations should be taken into account. I would suggest a restructuring of the manuscript (mainly from the Methods section), and perhaps also changing the manuscript Title.
1. Some included studies were aimed to assess the difference in telomere length between small-for-gestational-age and appropriate-for-gestational age (or appropriateand large-for-gestational age). Even a dataset was extracted from a manuscript on faetal growth retardation (Davy 2009, Table 2): It has been previously described that maternal complications (and even the low or high birth weight) would be related to complications in adult life. Moreover, the authors in the Discussion section mention that early determinants of telomere length may be important. I would suggest that a meta-analysis based on data on perinatal complications be considered separately, mainly because the underling mechanisms could be different. According, authors should dedicate a paragraph in the Discussion section to expand this topic.
Response: We thank the reviewers for raising this important point, which we have expanded upon in the Discussion.We think that the question of the relationship between perinatal complications and telomere length is an interesting question in its own right, and one that would merit discussion in a separate paper. Our paper focuses on measures of adiposity, or measures that act as proxies for adiposity (SGA and AGA) in early life.We did choose to include one study on foetal growth restriction (FGR)-Davy et al. (2009). We acknowledge that FGR in particular is associated with prenatal adversity. However, the study in question defined FGR simply as 'as any newborn having a birth weight of ≤5th percentile for Filipino newborns at a given gestational age'. Therefore, although the nomenclature of the study referred to these newborns as being 'FGR', definition of FGR in this study is similar to SGA, albeit with a more extreme cut-off value than is often used. Anthropometric measures are not reported to have been undertaken serially in utero when defining FGR in this study. We therefore believe that inclusion of this study with the other studies of SGA and AGA is justified. We acknowledge the use of different cut-offs for defining binary exposures of body size/adiposity as a general limitation of this study, which is also pertinent to exposures measured later in childhood (i.e. BMI): Discussion: "The lack of adjustment for prenatal factors in most studies also makes it difficult to establish whether the associations observed are due to a foetal predisposition to larger or smaller body size, or in utero effects. For example, birth weight may act as a surrogate marker for many maternal sources of in utero adversity (Tyrrell et al., 2016, https://www.ncbi.nlm.nih.gov/pubmed/26978208), and it may be these mechanisms that are important in determining telomere length. A meta-analysis focussing specifically on these exposures would therefore be of value in the field." Regarding the meta-analysis "AGA vs. SGA (Difference in TL [SD])", it should be also noted that even if effect sizes were expressed as the difference in telomere length noted that even if effect sizes were expressed as the difference in telomere length between the two groups, those estimates are not measures of "adiposity".
Response: The main focus of our narrative review and SR is adiposity; after conducting the review, it was the case that the greatest number of neonatal papers focussed on birth weight adjusted for gestational age. We acknowledge that some indices capture adiposity better than others, but would argue that LGA and SGA are proxies for higher and lower adiposity levels, given that SGA is defined as "weight" below the 10 centile for gestational age. We note that a recent paper has reported a high correlation between weight and adiposity in neonates (Chen et al., ). 2018 https://www.ncbi.nlm.nih.gov/pubmed/28990589 2. Authors included both cross-sectional studies (in which adiposity and telomere length were measured concurrently) and longitudinal studies (in which adiposity was measured in the neonatal period/childhood and telomere length was measured later in childhood or adulthood): Although the authors performed two separate meta-analyzes, it would be convenient to explain in Methods section that two analyzes with different specific objectives will be carried out, because I believe that the interpretation of the results of one or another study may not be the same.
Response:We agree that the analyses have different interpretations, and we have added a sentence to the methods to clarify that separate analyses were carried out.
Methods: "Moreover, cross-sectional and longitudinal studies were also meta-analysed separately, since longitudinal studies may provide information on whether an association between telomere length and birth weight tracks across the life course".
Please also see our response to Minor Comment 2, as follows: "In addition to combining estimates of adiposity at different ages in separate meta-analyses, we also conducted different analyses for binary and continuous exposures".
For longitudinal studies, the authors do not established an age of telomere length measurement. Therefore, it would be interesting for the reader to have a brief discussion about the aim of this specific study. I believe that the manuscript would be more valuable if data on "shortening" of telomere length is analyzed. Is it possible to add this analysis with the data provided in the manuscripts? Is it possible to request this information from the authors of the longitudinal studies?
Response: We acknowledge that the decision not to restrict to a particular age at telomere length measurement is a limitation of our study; whilst we would have liked to have explored the effect of age using meta-regression, such an analysis would not have been possible due to the small number of studies. We have also included and edited the following paragraph in our Discussion, which discusses the tracking of telomere lengths over time: Discussion: "…an alternate hypothesis is that telomere length is largely pre-determined at birth, [Factor-Litvak et al., 2016 We agree with the reviewer that studying shortening of telomere lengths over time may still be interesting; whilst this was not stipulated in our original protocol, our search strategy should have captured relevant studies examining change in telomere length. However, upon screening the literature for our current analysis, we only detected one paper studying change in telomere length in relation to childhood anthropometric measures. This paper was in the format of a trial, and thus did not meet our inclusion criteria. We therefore think that more studies would be necessary before useful evidence synthesis could be undertaken.

To facilitate the pooling of results, all estimates were standardized:
Perhaps in order to diminish a possible heterogeneity on datasets, the raw data could have been requested to original authors. Besides, this could have increased the number of included studies.
Response: Standardising estimates per se would not influence the heterogeneity of the results-we have simply put them on a comparable scale. Analysing data on the raw scale would be inappropriate since a one-unit increase in one study is not always the same as a unit increase in another. We note that standardising estimates across studies so that they are on comparable scales in meta-analyses is common practice.
Since many of the studies were published several years ago, individual-level raw data were not available (and even after contacting authors, some of the summary-level data required to standardise estimates were also not available). Restricting analyses to those with individual raw data available would have resulted in us being able to combine fewer studies, which would have been problematic, given that we cannot exclude the possibility of non-inclusion bias in our work. We have edited the Discussion to address this point: Discussion: "Where possible, we contacted authors to obtain the necessary information to standardise estimates, permitting them to be included in the meta-analysis. However, many of the source publications were written over 15 years ago, and the original data were not available. The meta-analysis may be subject to non-inclusion bias if the studies included in the meta-analyses are different to those not included".

Potential "Publication bias" assessment is missing in the manuscript.
Response: We agree that investigating potential publication bias is important. However, we were unable to test for small-study effects (as one indicator of publication bias) using funnel plots, since we had so few studies in each category (the following is a quote from Sterne et al. 2011, https://www.ncbi.nlm.nih.gov/pubmed/21784880): "As a rule of thumb, tests for funnel plot asymmetry should not be used when there are fewer than 10 studies in the meta-analysis because test power is usually too low to distinguish chance from real asymmetry." Response: We thank the reviewers for their comment. These letters were initially annotated on each of the panels of the Figures, corresponding to the six meta-analyses we ran. We have now changed the letters to the title of the analysis in each of the six meta-analyses presented e.g. 'Change in TL [SD] per 1-SD birth weight' (Figure 1, first analysis), which are also explained further in the legends of each Figure (1 and 2). Table 2, in the column "Sex Adjustment", in row 4 (De Zegher 2016) and also in row 16 (Shalev 2014) there is a question mark of uncertain significance.

In
Response: We thank the reviewer for noticing this; we have now corrected it. Figures 1 and 2 the statistics I2 and the p-values are shown. I assumed that the p-value corresponds to the analysis of heterogeneity, but it is actually not clear.

In
Response: These do indeed correspond to the heterogeneity statistic. We have clarified in the legend of each Figure. 10. In Figures 1 and 2 it would be useful for the reader to find a brief explanation on how to read the Forest plot.
Response: We have elaborated further in the legend of Figure 1, and also added this information to Figure 2.
No competing interests were disclosed. Competing Interests: