Measures of puberty in the Avon Longitudinal Study of Parents and Children (ALSPAC) offspring cohort

Background When studying the development of children through the preteen years into adolescence, it is often important to link features of their physical and mental health to the stage of puberty at the time. This is complex since individuals vary substantially in the ages at which they reach different pubertal milestones. Methods The Avon Longitudinal Study of Parents and Children (ALSPAC) is an ongoing longitudinal cohort study based in southwest England that recruited over 14000 women in pregnancy, with expected dates of delivery between April 1991 and December 1992. From 1999, information on puberty was collected using a number of different methods : (a) A series of annual questionnaires were administered when the index children were aged between eight and 17 years; these were mainly concerned with the physical changes associated with puberty; (b) identification of the age at peak height growth using the SITAR methodology; and (c) retrospective information from the girls on their age at onset of menstruation (menarche). Results The advantages and disadvantages of each method are discussed. Conclusions The data are available for analysis by interested researchers.


Methods
The Avon Longitudinal Study of Parents and Children (ALSPAC) is an ongoing longitudinal cohort study based in southwest England that recruited over 14000 women in pregnancy, with expected dates of delivery between April 1991 and December 1992.From 1999, information on puberty was collected using a number of different methods : (a) A series of annual questionnaires were administered when the index children were aged between eight and 17 years; these were mainly concerned with the physical changes associated with puberty; (b) identification of the age at peak height growth using the SITAR methodology; and (c) retrospective information from the girls on their age at onset of menstruation (menarche).

Results
The advantages and disadvantages of each method are discussed.Any reports and responses or comments on the article can be found at the end of the article.

Introduction
There is considerable evidence that the average age at onset of puberty has been decreasing over time in many parts of the world (Bräuner et al., 2020;Queiroga et al., 2020).Adding to concerns resulting from these unexplained secular trends are other changes occurring over time in regard to male biology, with increasing rates of testicular cancer (Huyghe et al., 2003) and decreasing sperm counts (Levine et al., 2017;Sengupta et al., 2018).These reports raise the question as to why such trends are occurring.A number of reviews have stated that more high-quality data are required, and that studies should be based on populations rather than individuals referred, for example, for investigation of infertility.
Longitudinal cohort studies that start before birth present an excellent opportunity to provide such data and allow for detailed descriptions as to the general pubertal development in a geographic population.
In this data note we describe one cohort study that has such data available for research: different types of information have been collected on pubertal development by the Avon Longitudinal Study of Parents and Children (ALSPAC).The aim of this paper is to describe the different measures collected, their advantages and disadvantages.Scientists who wish to access the data are invited to refer to the data availability statement and to find the relevant variable names in this document.

Ethical statement
Ethical approval for the study was obtained from the ALSPAC Ethics and Law Committee (ALEC; IRB00003312) and the Local Research Ethics Committees.All methods were performed in accordance with the relevant guidelines and regulations.Informed consent for the use of data collected via questionnaires and clinics was obtained from participants following the recommendations of the ALSPAC Ethics and Law Committee at the time (Birmingham, 2018).Detailed information on the ways in which confidentiality of the cohort is maintained may be found on the study website: http://www.bristol.ac.uk/alspac/researchers/research-ethics/.
The ALSPAC cohort Pregnant women resident in Avon, UK, with expected dates of delivery between 1st April 1991 and 31st December 1992 inclusive were invited to take part in the study.20,248 pregnancies were subsequently identified as being eligible; the initial number of pregnancies enrolled was 14,541.Of the initial pregnancies, there was a total of 14,676 foetuses, resulting in 14,062 live births and 13,988 children who were alive at one year of age.After various attempts to bolster the initial sample with eligible cases, the total sample size for analyses using any data collected after the age of seven is 15,447 pregnancies, resulting in 15,658 foetuses.Of these 14,901 children were alive at one year of age (Boyd et al., 2013;Fraser et al., 2013;Northstone et al., 2019).
Information collected relevant to this data note used selfcompletion questionnaires https://www.bristol.ac.uk/alspac/ researchers/our-data/questionnaires/puberty-questionnaires/) mailed to the main carer of the study child and to the child him/herself.As with the development of all ALSPAC questionnaires, many people, including members of the advisory committees, were involved in ensuring the validity of the questions asked, the design of the questionnaires and ethical aspects of the data collection.The questionnaires were sent out with a number (not a name), and on return a computer programme changed that number to a different number denoting the anonymised identification number.Incomplete and missing data was coded as such.Prior to administration the questionnaires were piloted on volunteers, and comments were taken into account in finalising and printing the questionnaires.Accurate information on height was obtained for the study children who attended clinics where they were measured.
Please note that the study website contains details of all the data that are available through a fully searchable data dictionary and variable search tool: http://www.bristol.ac.uk/ alspac/researchers/our-data/.

The SITAR growth statistics
The use of Super-Imposition by Translation And Rotation (SITAR) growth curve modelling has been used as an

Amendments from Version 1
What has changed Introduction: Has been reworded for clarification and omits references to precocious puberty as suggested by Reviewer 2.
Methods: Annual questionnaires through adolescents (4 th paragraph): Justification has been added on why congenital defects of male genitalia are included here.
Table 1: Age has been converted from months to years on recommendation from the reviewers.

:
The SITAR measurements: As requested by Reviewer 1, we have given a fuller explanation which includes the heights at each age.In all, over 61,000 height measurements were used to derive the SITAR components.Table 3 has been corrected on the bottom row.
Age at Menarche: We had described that age at menarche could be obtained from the annual questionnaires or via direct questioning at age 13.5yrs and that previous research had used one or the other.There is no clear way of combining these two data sets and we have now described in the text the way in which another author has addressed the problem.
Discussion: The discussion has been changed substantially.
In particular, a sentence at the end of the first paragraph referencing a new paper that describes correlations between the various measures described in this data note, together with newly derived informative variables.An additional paragraph (the final) in response to Reviewer 3's comment re pointing out that future users of this data carefully consider which pubertal descriptor(s) are the most appropriate for their needs.
Any further responses from the reviewers can be found at the end of the article instrument for the identification of age at peak height velocity as an estimate of age at puberty (Cole et al., 2010).Height measures were collected by trained ALSPAC staff in clinics for study children between the ages of seven and 20 years and were used to derive a number of variables of potential use in the determination of the growth spurt related to puberty.These were described by Frysz and her colleagues (2018) as follows: "SITAR is a mixed effects shape-invariant growth curve model, consisting of a mean growth curve along with three transformations (size, tempo and velocity), used to describe how each individual differs from the mean curve.The three SITAR parameters are size, reflecting up/down shift from the mean curve; tempo, reflecting left/right shift (on the age scale) which corresponds to the relative timing of puberty based on aPHV (age at Peak Height Velocity), and velocity reflecting stretching/shrinking of the age scale and hence describing differences in the rate at which individuals pass through puberty." The SITAR measurements have been used with a number of longitudinal epidemiological studies including ALSPAC and have shown that peak growth velocity occurred earlier in girls than boys (Elhakeem et al., 2022).The SITAR variables were associated with subsequent bone health (Elhakeem et al., 2019) and craniofacial measurements, particularly mandibular size (Queiroga et al., 2020) but little association with cardiovascular indicators at age 25 (Maher et al., 2021).

Annual questionnaires through adolescence
The primary aim of the puberty data collection through repeated questionnaires was to evaluate the effects of exposure to environmental chemicals upon defined growth and developmental markers in children.In particular, the aims as stated in the study proposal were to answer the questions: (i) Do prenatal maternal serum levels of persistent organic compounds predict premature sexual maturation in offspring; and (ii) Are there any birth outcomes (birth defects or developmental disabilities) in this cohort that may reflect prenatal exposure to specific environmental chemicals?
The original proposal stated that the ALSPAC data set offered a unique opportunity to evaluate the impact of prenatal and neonatal chemical exposures upon developing neurological and maturational indices.Many of these chemicals can interfere with normal endocrine and neurotransmitter pathways, e.g., persistent organochlorine chemicals including pesticides and PCBs (polychlorinated biphenyls) which cross the placenta and/or are excreted in breast milk.
To time the development of puberty, an established method uses a series of pictures known as Tanner staging (Marshall & Tanner, 1969;Marshall & Tanner, 1970).The latter questionnaire was developed for studies in the USA; however, the male version showed circumcised penises which was deemed inappropriate for a British population and so the pictures of the development of the penis were redrawn.Tanner staging assigns a numeric rank to the stage of secondary sexual development.Parental report or self-report of a child's Tanner staging has been validated elsewhere, e.g., for girls the correlations were 0.82 for self-report and 0.85 for maternal report vs clinical assessment (Brooks-Gunn et al., 1987; see also Morris & Udry, 1980); for boys the self-reported staging had a kappa coefficient of 0.88 (Duke et al., 1980).
A set of questions to the carer in the first of the male puberty questionnaires was used to determine the presence of malformations of the male genitalia such as hypospadias and undescended testes (cryptorchidism).Although not necessarily related to puberty, these defects are related to endocrine disrupting chemicals and form a component of the original research question (ii) outlined above.
The design of the questionnaires, all of which were named 'Growing and Changing' was such that separate questionnaires were produced for boys and girls (the sexes were as defined at birth).Those for boys had covers with pieces of a jigsaw puzzle, and those for the girls a flower spray (Puberty questionnaires | Avon Longitudinal Study of Parents and Children | University of Bristol).The colour of the covers varied with the sex and age of the participant.
A total of nine versions of this questionnaire were administered at ages varying from 97 months (eight years, one month) to 204 months (17 years) (Table 1).With the exception of questionnaire no. 7, which was only sent to those children attending the 15-year clinic (to obtain responses to the puberty questionnaire at the same time as the heights were measured by trained clinic staff), they were posted accompanied by other questionnaires.They were designed so that parents could complete them with or without the cooperation of their study child.The ages at completion of the questionnaires are presented in Table 1. a The ages at which 80-90% of respondents had completed their questionnaire.
b This was only sent to children who attended the 15-year clinic.There was a difference in the numbers of completed questionnaires returned for the different sexes (Table 1), as girls were more likely to complete the questionnaire than boys.This difference increased with age.Although the response rate itself was lower than for other questionnaires administered during this time period, the sex-difference in response mirrors that seen more widely across the ALSPAC data collection waves.

Contents common to boy and girl puberty questionnaires
For obvious reasons, most of the questionnaires were different for boys and girls.However, there were a number of common questions (Table 2).The parents were asked to measure their child's current weight and height (and given advice on how to measure height accurately).Answers to each of these measures could be given in metric or imperial (imperial measures were then transformed to metric).For questionnaire no. 7, height and weight were omitted as these measurements were measured at the 15-year clinic by trained ALSPAC staff.The participants who attended were asked to complete and bring their puberty questionnaire with them: they were deposited anonymously in a box on arrival.
For all questionnaires a simple question also asked about physical exercise.The wording for the boys was: 'In the past month, what was the average number of times that your son participated in vigorous physical activity (such as running, football, swimming, athletics)?'; for the girls it was: 'In the past month, what was the average number of times that your daughter participated in vigorous physical activity (such as running, dance, gymnastics, netball, swimming, or aerobics)?'.
At the end of each questionnaire, the date of completion was given.From this, and the child's date of birth, the age at completion was calculated in months.A note was made as to whether the questionnaire was completed by the child, a parent, someone else, or a combination of individuals.

Malformations of the male genitalia
In the boys' first questionnaire only, was a question designed to ascertain whether the boy had any congenital malformation of the genitalia, particularly focussing on undescended testes and hypospadias.The wording was as follows: 'Sometimes boys are born with something not quite right with their penis or scrotum.Please read the descriptions below and tick all that apply.'The options given were: 'There was nothing wrong'; 'the testes were not in the scrotum (known as undescended testes)'; the hole in the penis was in the wrong place (known as hypospadias)'; and 'something else' which the respondent was asked to describe.Two further questions elicited how many testes the boy had in his scrotum now; and whether he had ever undergone an operation on his penis, testes, or scrotum.The detailed answers are available as text but have not yet been coded.
[Note -there have been two publications from ALSPAC on hypospadias using sources such as the mothers' descriptions of investigations and treatments in the first three years of the child's life.(Hughes et al., 2002;North & Golding, 2000).

Descriptors of male pubertal development
As shown in Table 2, the stage of development of the boy's genitals and of his pubic hair were measured on each occasion.These used the same set of pictures and definitions, adapted for ALSPAC and redrawn for this study, using the Tanner definitions (Marshall & Tanner, 1969;Marshall & Tanner, 1970).
The question on change of voice was not asked in the first questionnaire, as it was thought to be too early for this to have occurred.Similarly, collection of information on growth of hair in the armpits was started from the 10-year questionnaire.

Descriptors of female pubertal development
As shown in Table 2, the stage of development of the girls' breasts, and details of menstruation were recorded on each occasion.These used the same set of pictures and definitions annually, adapted for ALSPAC with pictures redrawn for this study, using the Tanner definitions.As for the boys, collection of information on growth of hair in the armpits was only begun at the 10-year questionnaire stage.In addition, a question on age at menarche was asked of 5,112 girls attending the ALSPAC clinic at age 12.5 years (var ff2092).

Results
The SITAR measurements Height measurements of 10,236 participants at ages from 5 to 20, each of whom had at least one measurement in each of the three time periods: 5 -<19, 19 -<15 and 15 -20 years had been used to derive the SITAR calculations by Frysz and colleagues (2018).In all 61,290 height measurements were used to derive the SITAR components by Frysz and colleagues (2018).Table 3 indicates a mean age at peak height velocity to be almost two years later for boys than girls.

Results from the annual questionnaires
Genital malformations in boys.Table 4 gives the frequency of early genital structural problems and surgical procedures reported in boys.Note that by age eight as many as 7.0% of boys had less than two testes in their scrotum; but only 0.6% were reported to have hypospadias which compares with 0.64% found in ALSPAC when multiple sources of information were used (North & Golding, 2000).
Developmental phases for boys.Serious problems with the male Tanner genital stage data came to light when the data from the first five puberty questionnaires were analysed longitudinally.It was found that 27% of boys less than 10 years old appeared to regress in reported genital stage (Monteilh et al., 2011).This is in contrast with only 3-5% going backwards for each of male pubic hair stage, female breast stage, and female pubic hair stage.In addition, even after exclusion of males who went backwards in genital stage and males under 10 years of age, the estimated ages at transition into Tanner stages 2 or 3 produced by the modelling process are at least a year earlier than expected.It is therefore strongly recommended that the male Tanner genital stage data (variable PUB850) are not used.In contrast, the stages at which pubic hair appeared is assumed to be reliable and has been used to demonstrate that the ages of pubic hair development are influenced by (a) a genetic variant (Ong et al., 2009a) and (b) growth in the first eight years (Monteilh et al., 2011).
Despite the longitudinal trend in genital staging showing errors over time, the general trends of the different stages show the expected variation (Table 5).It can be seen that the modal stages at the different years of age were: at 8-10 years, >80% of boys were at stage 1; at age 10, 95% were at stages 1 or 2; but at ages 12-13 there was a wider difference in staging with 80% of boys at stages 2, 3 or 4. By age 14, 84% were at stages 4 or 5; and at 15, 90% were at this level of development.
The proportion of boys starting to grow pubic hair showed similar variation with 92% of 8-year-olds at stage 1 and 95% of 16-17-year-olds at stages 4 or 5 (Table 6).There was far less certainty concerning the boy's change of voice at ages 14 or over, with 10.6% of 17-year-olds being uncertain as to whether this had occurred yet (Table 7).

Developmental phases in girls.
In contrast to the boys, the developmental phases for girls (as collected using the annual questionnaires) showed the expected increases in phases over time in breast and pubic hair development and onset of menstruation (Table 8-Table 10).2, the age at which the girl had her first period together with questions concerning details of menstruation were the same in all nine questionnaires.This enabled the age of menarche to be calculated.

Age at menarche. As shown in Table
[However, it should be noted that the age of menarche can also be obtained for the study girls from another source within ALSPAC [questions asked in a clinic visit at age 13.5 years -variable: fg6192 (Figure 1)].Some researchers have just used the latter source even though the ages are curtailed (Ong et al., 2009a andOng et al., 2009b;Rogers et al., 2010), others have combined the two sources (Joinson et al., 2011), and the CDC group have just used the longitudinal data from the nine questionnaires (e.g., Adgent et al., 2012;Christensen et al., 2010a;Christensen et al., 2010b;Christensen et al., 2010c;Christensen et al., 2011;Maisonet et al., 2010;Rubin et al., 2009).Combining the two sources is complex, but when Joinson and colleagues (2011) considered the association between menarche and depression they categorised the data from the two data sets into early menarche (<11.5 years), normative (11.5 -12.5 years) and late onset (>12.5 years).

Discussion
In this data note we have outlined the data collected on the developmental markers that are available in ALSPAC to estimate the likely onset of puberty for boys and girls.We have indicated that, for boys, the Tanner diagrams of genital        development are problematic at early ages, but that age at onset of appearance of pubic hair is probably a valid measure.For girls, the Tanner pictures of breast development and appearance of pubic hair give plausible results, as does the age at menarche, particularly if using the Tanner questionnaires.For both sexes the age at peak height velocity gives ages in close alignment with those of the biological development measures.Correlations between the various measures described here, together with newly derived informative variables can be found for this cohort elsewhere (Elhakeem et al., 2023).Interestingly, the authors also show that the published genetic risk scores for age at menarche are also associated with the different aspects of pubertal development as described here.

Strengths and limitations
The strength of these data lies in the fact that the population is geographically defined and includes the majority (~80%) of the eligible population.Other strengths include a large sample size, longitudinal design and regular administration of questionnaires on pubertal development (thus reducing the chance of recall bias) with additional availability of data on age at peak height velocity and age at menarche as well as potential confounding factors.Limitations of this study include sample attrition which is strongly associated with socioeconomic disadvantage.An additional drawback is that, at the time of birth, there were very few families of non-white minorities resident in the area (~6%).Consequently, any research results cannot be extrapolated to cover non-white populations in general.As previously stated, there are also problems with the male genital development data.We suggest that researchers collecting male Tanner data by self-completion questionnaires might consider including a subset of participants in which staging is also identified using palpation and observation to address the issue of validity of the selfcompletion measures used.
This data note provides details of various questionnaire-based measures that were originally designed to be used by researchers investigating possible effects of exposure to endocrine disrupting chemicals.However, they can be applied to any question, concerning the antecedents and long-term consequences of age at onset of puberty in the ALSPAC cohort.In addition, the questions used are recorded here as researchers planning data collection on this topic may find them useful, particularly if planning to compare across cohorts.
There is probably no one measure that accurately captures 'age at puberty', given that puberty is a process, not a binary yes/no event.The various markers described here likely reflect different biological mechanisms to different extents, for example, levels of testosterone, adrenal, growth and gonadal hormones.The availability of multiple measures of pubertal timing may therefore facilitate comparisons that have the potential to yield mechanistic insights.

Data availability
Underlying data ALSPAC data access is through a system of managed open access.The steps below highlight how to apply for access to the data included in this data note and all other ALSPAC data: 1. Please read the ALSPAC access policy (http://www.bristol.ac.uk/media-library/sites/alspac/documents/researchers/dataaccess/ALSPAC_Access_Policy.pdf) which describes the process of accessing the data and samples in detail, and outlines the costs associated with doing so.
2. You may also find it useful to browse our fully searchable research proposals database (https://proposals.epi.bristol.ac.uk/?q=proposalSummaries), which lists all research projects that have been approved since April 2011.
3. Please submit your research proposal (https://proposals.epi.bristol.ac.uk/) for consideration by the ALSPAC Executive Committee.You will receive a response within 10 working days to advise you whether your proposal has been approved.
For data specific to the puberty questionnaires, the variable numbers are given in the following tables.Each variable number should be preceded by PUB.For example, in Table 1, the variable for weight for the 4 th questionnaire is PUB404.
This project contains the following extended data: • Supplementary Tables.docx.(Variable names for questionnaires).
Data are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0).

Open Peer Review
Current Peer Review Status: I understand that some loss of data is to be expected between the children in the cohort who reached one year of age, the children who could be identified for survey completion at 7-8 years of age (and in subsequent rounds), and the number of surveys completed in full.In order for a reader to confirm the 80% eligibility statement in the discussion, it would be helpful if the authors could provide those numbers in addition to reviewing eligible pregnancies, births, and those reaching one year old, which they have already included.
Competing Interests: No competing interests were disclosed.
Reviewer Expertise: My focus is the transition from pediatric to adult health care.
I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.I think the authors may have missed my final comment in the first review related to the implications of attrition bias.The authors noted that attrition was strongly related to socioeconomic status, but there was no further explanation as to how that might affect the data.I provided some potential implications in my last review and I would suggest adding these to the discussion among any others that the authors might be able to come up with.
Also, I note the following amendment: "There is probably no one measure that accurately captures 'age at puberty', given that puberty is a process, not a binary yes/no event.The various markers described here likely reflect different biological mechanisms to different extents, for example, levels of testosterone, adrenal, growth and gonadal hormones.The availability of multiple measures of pubertal timing may therefore facilitate comparisons that have the potential to yield mechanistic insights."I feel like this paragraph is not 100% correct.The main purpose of measuring different markers/descriptors is not to show that puberty is not a binary yes/no event, but really to demonstrate that puberty is a broad term that describes maturation of several different hormone systems that manifest as different physical changes (not sure if i've worded this very well but hopefully the authors understand what I'm trying to say).Moreover, testosterone is a gonadal hormone, so there is redundancy in the sentence "levels of testosterone, adrenal, growth and gonadal hormones" First of all, I feel that this manuscript is important in that ALSPAC is indeed a highly valuable longitudinal adolescent health dataset that undoubtedly has much more to give.
A clear limitation of the dataset that I feel has not been adequately disclosed, however, is the selfreported nature of Tanner staging.The gold standard method is Tanner staging via palpation by a trained clinician.ALSPAC falls short in this respect, but understandably this is the compromise made for the sample size and longitudinality of the cohort, and also for ethical reasons.However, I'm failing to see this acknowledged adequately as a limitation in the discussion section.This is an important point as readers of this manuscript who may be interested in accessing the ALSPAC data may have expertise in other areas of adolescent health, but not necessarily in the nuances of different pubertal assessment methodologies.
It would also be useful to relate each pubertal descriptor used in ALSPAC to the underlying hormones responsible for such development.For example, pubertal height growth relates more to the synergistic actions of growth hormone and testosterone, the tanner breast/genital measures are more reflective of gonadal hormones, whereas pubic hair development more to adrenal hormones.Depending on the health outcome of research interest, one descriptor may be more relevant to others.It would be important to point out that any future users of the ALSPAC data carefully consider which pubertal descriptor(s) is most appropriate for their research question, as these descriptors do not measure the same thing and cannot be used interchangeably.
I commend the authors for being upfront about the quality of the male Tanner stage data.My team had similar experiences with our adolescent males (although to a lesser degree), but the fact that challenges relating to self-reported teen male data was documented in the manuscript is valuable for any future research in this area.
Finally, I just have a question as authors alluded to attrition related to "socio-economic disadvantage" would it be useful to expand on this further and talk about the implications of this attrition bias?i.e. those with lower SES who dropped out of the study may be the ones experiencing earlier puberty due to potential childhood stressors, higher rates of obesity etc.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.
Reviewer Expertise: Pubertal assessment, adolescent nutrition and obesity I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
longitudinal adolescent health dataset that undoubtedly has much more to give.A clear limitation of the dataset that I feel has not been adequately disclosed, however, is the self-reported nature of Tanner staging.The gold standard method is Tanner staging via palpation by a trained clinician.ALSPAC falls short in this respect, but understandably this is the compromise made for the sample size and longitudinality of the cohort, and also for ethical reasons.However, I'm failing to see this acknowledged adequately as a limitation in the discussion section.This is an important point as readers of this manuscript who may be interested in accessing the ALSPAC data may have expertise in other areas of adolescent health, but not necessarily in the nuances of different pubertal assessment methodologies.
Many thanks for your important observations.You are right in that, because of the considerable cost of doing so, the study did not include palpation of the male genitalia, although this was discussed at the planning stage.In retrospect it would have been valuable to palpate a random sample of boys and compare the results with the questionnaire data, and we have now mentioned that in the limitation section of the discussion.It would also be useful to relate each pubertal descriptor used in ALSPAC to the underlying hormones responsible for such development.For example, pubertal height growth relates more to the synergistic actions of growth hormone and testosterone, the tanner breast/genital measures are more reflective of gonadal hormones, whereas pubic hair development more to adrenal hormones.Depending on the health outcome of research interest, one descriptor may be more relevant to others.It would be important to point out that any future users of the ALSPAC data carefully consider which pubertal descriptor(s) is most appropriate for their research question, as these descriptors do not measure the same thing and cannot be used interchangeably.Thank you very much for pointing this out.We have inserted a paragraph at the end of the Discussion to cover this important point.In this report, the authors compare their strategies for determining the stage of puberty for each participant in the ALSPAC cohort.

Competing Interests: None
The introduction is focused mostly on precocious puberty for the first few sentences, which is not mentioned in the abstract.It becomes evident toward the end of the introduction that the aim is to study puberty longitudinally in the cohort so that the data can inform our understanding of precocious puberty, and so the link comes together eventually.As a reader, I would have appreciated the link being made in both the abstract and the introduction.At the same time, I found the reasoning for the study described in the abstract much more compelling.I wonder if the authors might consider focusing the introduction on the idea that study of adolescent health overall would benefit from understanding findings relative to where an adolescent is in their pubertal development.
This is in part because the focus on precocious puberty raises a methodologic concern for me, which is that the authors didn't start collecting data until age 8, which is late for a focus on precocious puberty.If the goal is to study precocious puberty, wouldn't data collection have needed to start sooner?Similarly, if the interest is in studying precocious puberty, why wait until 10 years of age to ask about hair in the arm pits?
In the strengths and limitations, the authors mention that they have data on 80% of the eligible population, but it is not clear to me how they got that number.The initial cohort included 15,000 births and the highest number of returned surveys in Table 1 is ~7,000, which is less than 50%.If they could clarify that point, I would appreciate it.

Is the rationale for creating the dataset(s) clearly described? Yes
Are the protocols appropriate and is the work technically sound?Yes

Are sufficient details of methods and materials provided to allow replication by others? Yes
Are the datasets clearly presented in a useable and accessible format?Yes The introduction is focused mostly on precocious puberty for the first few sentences, which is not mentioned in the abstract.It becomes evident toward the end of the introduction that the aim is to study puberty longitudinally in the cohort so that the data can inform our understanding of precocious puberty, and so the link comes together eventually.As a reader, I would have appreciated the link being made in both the abstract and the introduction.At the same time, I found the reasoning for the study described in the abstract much more compelling.I wonder if the authors might consider focusing the introduction on the idea that study of adolescent health overall would benefit from understanding findings relative to where an adolescent is in their pubertal development.This is in part because the focus on precocious puberty raises a methodologic concern for me, which is that the authors didn't start collecting data until age 8, which is late for a focus on precocious puberty.If the goal is to study precocious puberty, wouldn't data collection have needed to start sooner?Similarly, if the interest is in studying precocious puberty, why wait until 10 years of age to ask about hair in the arm pits?We are grateful to the Reviewer for raising this issue.She is correct in thinking that although the initiators of the pubertal data collection had intended to look at precocious puberty, bureaucratic delays meant that the children were already 8 years of age before we could start administering the questionnaires.We have therefore omitted any mention of precocious puberty in the text, and feel that the reasons behind the study now develop more rationally.In the strengths and limitations, the authors mention that they have data on 80% of the eligible population, but it is not clear to me how they got that number.
The initial cohort included 15,000 births and the highest number of returned surveys in Table 1 is ~7,000, which is less than 50%.If they could clarify that point, I would appreciate it.Apologies for the confusion.The statement was intended to refer to the enrolment rate within the population at risk.As will be appreciated, when identifying eligible pregnancies as opposed to births, the numbers are approximate and vary according to gestation, pregnancy outcome and the time frame.In general, excluding the poor responses at the start and end of the enrolment period, 80% appears to be a reasonable estimate of the number of viable pregnancies enrolled.does not yet really compare the different approaches.

Competing
Areas that need more comparison are: Age at menarche: They describe two approaches to recording this, one being prospective self-completion questions annually to age 17 (table 10) and the other being a retrospective question at age 13.5 (figure 1) which would be bound to miss girls reaching menarche going beyond that age.While the results in table 10 suggest that the latter approach might miss only around 4% of children, this is not stared explicitly.Instead of showing figure 1 it would be better to compare the results in one table and also show the result of the combined method mentioned in the text. 1.
Pubertal changes in boys: Given the challenges in assessing penile size, it would be interesting to compare these to the data on pubic hair, voice breaking and underarm hairfor which no data are shown currently.Could these be combined in a figure and could their inter-correlation be assessed in some way -e.g looking at the correlation of age at which boys entered stage 2 or had any voice change. 2.
The use of the SITAR method is novel and interesting, but the results need a fuller explanation.A limitation of assessing peak height velocity is the availability of height data at the right ages.We need to be shown the numbers of heights at each age and the % of the children included with heights at most or all ages.
In the design of the new UK charts a simplified pubertal assessment approach was adopted with just 2 stages for each gender -starting puberty, which constituted any pubic hair on either gender or penile enlargement in boys or development of breast in girls, and completion of puberty being menarche in girls or voice breaking or underarm hair in boys.I am aware that this has not been fully written up yet, but as well as the brief mention cited here The current section on strengths and the limitations discuss the ALSPAC data as a resource, yet the data are now quite old and already much used.Surely the main point is to help people in their planning of future studies.In that case it is the strengths and limitations of the questions used that matter.
More minor points: Ages given in months in table 1 would be much clearer if used decimal age in years.1.
Table 3 has a mistake in bottom line -values for total N and boys have been displaced sideways.

2.
I was unconvinced by the relevance in a paper on puberty assessment of including information on genital defects or activity levels.Reviewer Expertise: Assessment of growth and nutritional status, Child nutrition, epidemiology I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.
Author Response 11 Jan 2024

Yasmin Iles-Caven
This interesting article describes the various ways in which puberty was assessed in the ALSPAC longitudinal cohort and gives a clear overall account of the data gathered and what it showed.The articles states an aim to consider the advantages and disadvantages of different assessment approaches, but apart from a very interesting account of the difficulties of assessing genital size, it does not yet really compare the different approaches.Areas that need more comparison are: Age at menarche: They describe two approaches to recording this, one being prospective self-completion questions annually to age 17 (table 10) and the other being a retrospective question at age 13.5 (figure 1) which would be bound to miss girls reaching menarche going beyond that age.While the results in table 10 suggest that the latter approach might miss only around 4% of children, this is not stated explicitly.Instead of showing figure 1 it would be better to compare the results in one table and also show the result of the combined method mentioned in the text.

1.
Thank you for raising this issue.Although there is no clear way of combining these two data sets, we have now described in the text the way in which another author has addressed the problem (see end of Results section).
Pubertal changes in boys: Given the challenges in assessing penile size, it would be interesting to compare these to the data on pubic hair, voice breaking and underarm 1.
hair -for which no data are shown currently.Could these be combined in a figure and could their inter-correlation be assessed in some way -e.g looking at the correlation of age at which boys entered stage 2 or had any voice change.The inter-correlations between the various ALSPAC measures of puberty have been explored in detail by Elhakeem and colleagues (2023).Although available on line we did not wish to quote before they have been published after peer review.Reference to this is now included in the first paragraph of the Discussion.Their results also show, interestingly, that the genetic risk score for age at menarche is positively associated with the age at onset of the other puberty measures, including those of the boys as well as of the girls.The use of the SITAR method is novel and interesting, but the results need a fuller explanation.A limitation of assessing peak height velocity is the availability of height data at the right ages.We need to be shown the numbers of heights at each age and the % of the children included with heights at most or all ages.We have expanded the information in the paper to say: 'Height measurements of 10,236 participants at ages from 5 to 20, who had at least one measurement in each of the three time periods: 5 -<19, 19 -<15 and 15 -20 years, had been used to derive the SITAR calculations by Frysz and colleagues (2018).In all, 61,290 height measurements were used.'In the design of the new UK charts a simplified pubertal assessment approach was adopted with just 2 stages for each gender -starting puberty, which constituted any pubic hair on either gender or penile enlargement in boys or development of breast in girls, and completion of puberty being menarche in girls or voice breaking or underarm hair in boys.I am aware that this has not been fully written up yet, but as well as the brief mention cited here 1 , I understand that it has been described in both Oxford Textbook of Medicine and Oxford Textbook of Endocrinology and Encyclopedia of Endocrine Diseases in chapters authored by Professor Gary Butler.It looks to me as if this data could be used to describe the age of each stage in the same way -and I wonder if this new approach should be mentioned here at least.Thank you for telling us about this.On reflection, this Data Note is for an international audience and we feel that reference to UK standards might seem somewhat parochial if their justification has not yet been published in a peer review journal.Having added or changed the above elements I would like to see either text or a table that makes recommendations about which questions to include for what purpose.Thank you for this suggestion.We have tried to address the question towards the end of the Discussion.
The current section on strengths and the limitations discuss the ALSPAC data as a resource, yet the data are now quite old and already much used.Surely the main point is to help people in their planning of future studies.In that case it is the strengths and limitations of the questions used that matter.Conversely we feel that puberty being only one feature in the lifecourse means that the value of these data in the future will be in identifying ways in which the timing of puberty of an individual may have a profound effect on their subsequent physical and mental health.More minor points: Ages given in months in table 1 would be much clearer if used decimal age in years.This has now been done Table 3 has a mistake in bottom linevalues for total N and boys have been displaced sideways.Thank you for spotting this.It has now been corrected.I was unconvinced by the relevance in a paper on puberty assessment of including information on genital defects or activity levels.Apart from the importance of collecting information on genital defects in order to determine links with endocrine disruptors, information on genital defects and operations on genitalia was collected since researchers may wish to exclude boys with such defects from analyses on pubertal state.The details on exercise/activity were included in the puberty questionnaires since it may be that researchers may wish to determine whether the degree of such activities may delay (or advance) the onset of different stages of puberty.We have described these variables here so that researchers looking at the relationships between exercise and body mass index on different aspects of the onset of puberty will be aware that the data were collected at the same time points as the Tanner descriptions.Are sufficient details of methods and materials provided to allow replication by others?Partly We hope that our answers to your queries and comments will now assure you that the information in this Data Note will allow replication by others.

Figure 1 .
Figure 1.Age (months) at the onset of menstruation ascertained at the 13.5-year clinic.
doi.org/10.21956/wellcomeopenres.23227.r74383© 2024 Hart L. This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Laura Hart 1 Departments of Pediatrics and Medicine, The Ohio State University College of Medicine, Columbus, OH, USA 2 Center for Child Health Equity and Outcomes Research, The Abigail Wexner Research Institute, Nationwide Children's Hospital, Columbus, OH, USA I agree with the authors that the edits to the introduction improve the article greatly.
Figure 1 histogram -suggest also showing age in years instead of months.

Reviewer
Report 10 November 2023 https://doi.org/10.21956/wellcomeopenres.21924.r69487© 2023 Hart L. This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Laura Hart 1 Departments of Pediatrics and Medicine, The Ohio State University College of Medicine, Columbus, OH, USA 2 Center for Child Health Equity and Outcomes Research, The Abigail Wexner Research Institute, Nationwide Children's Hospital, Columbus, OH, USA Interests: None Reviewer Report 25 October 2023 https://doi.org/10.21956/wellcomeopenres.21924.r68625© 2023 Wright C.This is an open access peer review report distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.Charlotte Wright University of Glasgow, Glasgow, Scotland, UK This interesting article describes the various ways in which puberty was assessed in the ALSPAC longitudinal cohort and gives a clear overall account of the data gathered and what it showed.The articles states an aim to consider the advantages and disadvantages of different assessment approaches, but apart from a very interesting account of the difficulties of assessing genital size, it R, Wright C: Using the new UK-WHO growth charts.Paediatrics and Child Health.2014; 24 (3): 97-102 Publisher Full Text Is the rationale for creating the dataset(s) clearly described?Yes Are the protocols appropriate and is the work technically sound?Yes Are sufficient details of methods and materials provided to allow replication by others?Partly Are the datasets clearly presented in a useable and accessible format?Yes Competing Interests: No competing interests were disclosed.

Table 2 . Content of the annual questionnaires. Data collected Questionnaire No. Content common to both Boy/Girl questionnaires 1 2 3 4 5 6 7 8 9
Question changed from 'severe cramps' to 'pain with period' and a further question on whether mild, moderate or severe.This change only occurs in Questionnaire no. 7.
a Measured at the 15-year clinic examination;b Supplementary question concerned whether a doctor had been consulted; c

Table 4 . The frequency of genital defects with which the boy was born and/or for which he had had surgery as reported at age eight years.
*At time of completion of first puberty questionnaire

Table 7 . Frequency of reported male change in voice across eight time points. Q Age (Y) No change Changes occasionally
Q = questionnaire no.; Y = year of age; N = no. of individuals with data

Table 6 . Frequency of reported Tanner stage of male pubic hair development across the nine time points.
Q Age (y) Stage 1 Stage 2 Stage 3 Stage 4 Stage 5 N (= 100%) Q = questionnaire no.; Y = year; N = no. of individuals with data

Table 5 . Frequency of reported Tanner stage of male genital development across the nine time points. Q Age (y) Stage 1 Stage 2 Stage 3 Stage 4 Stage 5 N (= 100%)
Q = questionnaire no.; Y = year; N = no. of individuals with data

Table 8 . Frequency of reported Tanner stage of female breast development across the nine time points. Q Age (y) Stage 1 Stage 2 Stage 3 Stage 4 Stage 5 unsure N (= 100%)
Q = questionnaire no. ; Y = year; N = no. of individuals with data

Table 9 . Frequency of reported Tanner stage of female pubic hair development across the nine time points.
Q Age (y) Stage 1 Stage 2 Stage 3 Stage 4 Stage 5 N (= 100%) Q = questionnaire no. ; Y = year; N = no. of individuals with data

Table 10 . Frequency of reported female menstruation across eight time points.
Q = questionnaire no.; Y = year of age; N = no. of individuals with data 1, I understand that it has been described in both Oxford Textbook of Medicine and Oxford Textbook of Endocrinology and Encyclopedia of Endocrine Diseases in chapters authored by Professor Gary Butler.It looks to me as if this data could be used to describe the age of each stage in the same way -and I wonder if this new approach should be mentioned here at least.Having added or changed the above elements I would like to see either text or a table that makes recommendations about which questions to include for what purpose.