Genetic and non-genetic factors in prediction of early pubertal development in Chinese girls

Objective The objective of this study is to develop a combined predictive model for early pubertal development (EPD) in girls based on both non-genetic and genetic factors. Methods The case-control study encompassed 147 girls diagnosed with EPD and 256 girls who exhibited normal pubertal development. The non-genetic risk score (NGRS) was calculated based on 6 independent biochemical predictors screened by multivariate logistic regressions, and the genetic risk score (GRS) was constructed using 28 EPD related single-nucleotide polymorphisms (SNPs). Area under receiver operator characteristic curve (AROC), net reclassification optimization index (NRI) and integration differentiation index (IDI) were used to evaluate the improvement of adding genetic variants to the non-genetic risk model. Results Overweight (OR=2.74), longer electronic screen time (OR=1.79) and higher ratio of plastic bottled water (OR=1.01) were potential risk factors, and longer exercise time (OR=0.51) and longer day sleeping time (OR=0.97) were protective factors for EPD, and the AROC of NGRS model was 83.6% (79.3-87.9%). The GRS showed a significant association with EPD (OR=1.90), and the AROC of GRS model was 65.3% (59.7-70.8%). After adding GRS to the NGRS model, the AROC significantly increased to 85.7% (81.7-89.6%) (P=0.020), and the reclassification significantly improved, with NRI of 8.19% (P= 0.023) and IDI of 4.22% (P <0.001). Conclusions We established a combined prediction model of EPD in girls. Adding genetic variants to the non-genetic risk model brought modest improvement. However, the non-genetic factors such as overweight and living habits have higher predictive utility.


Introduction
Early pubertal development (EPD) is commonly defined as the development of secondary sexual characteristics of girls before 8 years old and boys before 9 years old (1).In recent years, the prevalence of EPD has shown an increasing trend.An observational study from the United States showed that 10% of white girls and 23% of black girls at the age of 7 years old have started puberty (2).According to a school-based survey in China, 11.47% of girls before age 8 and 3.26% of boys before age 9 had signs of EPD (3), with a higher incidence in girls than in boys.EPD will lead to accelerated skeletal maturation, advanced bone age, and early epiphysis closure, all of which will affect the final adult height.In addition, it may result in psychological issues or abnormal social behavior (4).Early menarche in girls is also related to long-term health consequences including obesity, type 2 diabetes, breast cancer, and cardiovascular events (5).Therefore, EPD has attracted the attention of global public health concerns, and it is very important to evaluate and diagnose girls suspected of EPD in time.
The timing of normal pubertal onset varies greatly and is influenced by environmental and genetic factors.It is considered that overnutrition, insufficient exercise, insufficient sleep (6), and expose to some endocrine-disrupting chemicals (EDCs) (7) are linked to early puberty.Currently, there are no risk prediction models for EPD developed worldwide, and the prediction of EPD risk has not been part of routine practice for families.
In addition to environmental factors, many studies have examined the relationship between candidate genes and pubertal timing.Mutations in Delta-like homolog 1 (DLK1), the kisspeptin system (KISS1 and KISS1R), and the Makorin RING-finger protein 3 (MKRN3) gene have been identified in sporadic and familial cases of central precocious puberty (8,9).Besides, some large-scale genome-wide association studies (GWAS) have begun to search for single-nucleotide polymorphisms (SNPs) that are associated with the onset time of puberty (10).One single SNP has a low explanation ability in disease risk, but these susceptibility loci in combination could explain more disease risk variation.Until now, there is no research on constructing a genetic risk score based on the known EPD risk alleles.
The aim of this study was firstly to establish a non-genetic risk model to predict EPD in girls, and then assess the predictive ability improvement as well as the reclassification of adding genetic variants to the non-genetic risk model.

Study design and participants
This was a case-control study conducted between October 2019 and August 2022 in Tianjin Women and Children's Health Center, China.It encompassed 147 girls who were newly diagnosed with EPD and 256 girls who exhibited normal pubertal development.The inclusion criteria of case group were: 1) Girls.2) Confirm the Chinese diagnostic criteria for EPD in girls (11), which involves the development of secondary sexual characteristics before the age of 8 or menstruation before the age of 10.All girls were physically examined by qualified female pediatricians in a private room, and then were given a B-ultrasound test of breast, uterus and ovary using a Philips EPIQ7 ultrasonic instrument.The Tanner Staging was used to evaluate children's secondary sexual characteristics (12,13).3) Children and their guardians agreed to participate in the study and signed the informed consent form.Exclusion criteria include: 1) Secondary central precocity, such as central nervous system occupying, postoperative, infection, trauma, chemotherapy or radiotherapy, and congenital dysplasia.2) Other primary diseases that may lead to EPD, such as McCune-Albright syndrome, congenital adrenal hyperplasia, granulomatous disease, congenital hypothyroidism, cerebral palsy, and malignant tumor.3) A history of using hormone drugs.
Based on the routine health examination of kindergartens/ primary school students in Tianjin, 256 girls of the same age (7.72 ± 1.0 vs. 7.60 ± 1.0 years) who exhibited normal pubertal development were recruited as the control group.The inclusion criteria for the control group were: 1) With normal pubertal development, and without secondary sexual development before the age of 8.All girls were physically examined by qualified female pediatricians in a private room, and Tanner Staging method was used to evaluate children's secondary sexual characteristics.2) Children and their guardians agreed to participate in the study and signed the informed consent form.Exclusion criteria include: 1) Primary diseases that may lead to EPD, such as McCune-Albright syndrome, congenital adrenal hyperplasia, granulomatous disease, congenital hypothyroidism, cerebral palsy and malignant tumor.2) A history of using hormone drugs.
This study was approved by the Ethics Committee of Tianjin Women and Children Health Center (approval number: ky-20190119), and all guardians of the research subjects have signed the informed consent form.

Data collection methods
The weight, height, and waist circumference of children were measured according to standardized procedures.Using a digital scale (TCS-60, Tianjin Weighing Apparatus Co., China), body weight was measured to the nearest 0.01 kg.Using a station meter (SZG-180, Shanghai Zhengdahengqi, China), standing height was measured to the nearest 0.1 cm.Children's waist circumference was measured midway between the lower rib margin and the iliac crest, and the measurement was accurate to the nearest 0.1 cm.A body composition analyzer (Inbody J-20, Korea) was used to measure the body fat percentage of children.Body mass index (BMI) was calculated by dividing weight (kg) by the square of height (m).According to the WHO age-and genderspecific growth reference of 0~60 months (14) and 5~19 years old (15), children's Z scores of BMI for age were calculated.Overweight/obesity in children was defined as BMI ≥ 85 th percentiles (Z score of BMI for age ≥ 1.035) which included both overweight and obesity to improve the statistical power of test.
We obtained the information of the participants through a questionnaire survey.The researcher filled out the questionnaires during a face-to-face interview with children and their guardians.The questionnaire consisted of five sections: 1) General information (date of birth, age, and ethnicity); 2) Parents' information (parents' weight and height, mother's first menarche age, and father's first spermatogenesis age); 3) Diet information for the previous month and the average per day was calculated, by asking "How often did you eat snacks, including chips, cookies, candy, cake, chocolate, ice cream and other desserts?","How many eggs did you eat, including all kinds of cooking like poaching, frying, and being included in other foods?"; 4) Behavior information in 1 month prior to the survey, and the average time per week was calculated, including: electronic screen watching time, moderate-to-vigorous physical activity time (MVPA) time, and also time spent on the roadside where children might be exposed to automobile exhaust (including walking, biking or parking on the road or roadside); the proportion of plastic bottled water in total drinking water was also investigated.Moderate physical activity is defined as 12-14 of 20 grades in the RPE scale (16), and the intensity is 3.0 to 5.9 metabolic equivalent (MET), such as skating, jogging, cycling at normal speed, etc. Vigorous physical activity is defined as 15 or above of 20 grades in the RPE scale, and the intensity is ≥ 6 MET, such as running fast, carrying heavy objects, cycling fast, etc; 5) Sleep habits in 1 month prior to the survey, and the average time per day was calculated, including: bedtime at night, wakeup time in the morning, average time of day sleeping and average time of night sleeping.The total sleeping time was obtained by night sleeping time plus day sleeping time.

SNPs selection and genotyping
The buccal mucosa cells of children were collected using two buccal swabs by trained investigators according to the manufacturers' instructions.According to the manufacturer's instructions (Epicentre Biotechnologies, Madison, WI), DNA was extracted from two buccal swabs (placed in the same test tube) by the heat lysis method.SNP typing was performed by MassARRAY flight mass spectrometry.The success rate of genotyping was >98%.In order to control the quality, 10% of the samples were re-genotyped, and the coincidence rate was >99%.
Mutations in the MKRN3 gene, the kisspeptin system (KISS1 and KISS1R), and DLK1 have been identified in sporadic and familial cases of central precocious puberty (9), and 13 SNPs were selected from these genes.Another 15 highly correlated SNPs were selected from the GWAS conducted in East Asia (10).The SNP, reported gene, functional class, and alleles are presented in Table 1.

Score construction
We calculated a genetic risk score (GRS) with the selected 28 SNPs.Logistic regression was conducted to determine the association between the number of risk alleles and EPD.The weighted GRS was computed by multiplying the number of risk alleles (0, 1, or 2) of each SNP by the natural logarithm of OR in the Logistic regression for that allele and summing across all SNPs.Similarly, the calculation principle of the weighted non-genetic risk score (NGRS) was the same as that of GRS.For each individual, the NGRS was calculated by the sum of risk factors weighted by OR (b) values of different non-genetic risk factors in Logistic stepwise regression.Assuming that genetic and non-genetic factors were independent, we added the weighted GRS to each risk algorithm to obtain a combined genetic and non-genetic score (CRS).

Reclassification of EPD risk
We used the net reclassification improvement (NRI) and integrated discrimination improvement (IDI) to quantify the extent to which CRS moved people to risk categories that better reflected their event status (17, 18).Similar to NRI, IDI large than zero reflected a positive improvement, meaning that the prediction ability of the new model was improved compared with the old model.We used both categories (EPD risk: < 10%, 10% -<90%, and ≥ 90%) and continuous EPD risk to calculate NRI.Girls with EPD were considered to be correctly reclassified if they moved to a higher risk category, while those who moved to a lower risk category were considered to be incorrectly reclassified.Girls without EPD were the opposite.We also used continuous EPD risk to calculate NRI, which reflected the change defined by any upward or downward change of the specified risk.

Statistical analyses
The general characteristics between groups were compared by using Chi-square test for categorical variables, and student's t-test for continuous variables if their normal distribution was not rejected.The prediction models were constructed by logistic regression analyses.The area under receiver operating characteristics curve (AROC) was utilized to evaluate the predictive abilities of the GRS, NGRS and CRS models, and the DeLong test was used to compare the AROC values of different models (19).
A family history of diseases, in specific cases, reflects genetic predisposition, so there is a strong association between the family history of EPD and EPD related GRS.In addition, family disease history can share environmental and lifestyle factors, and even inclusive fetal programming.Therefore, we included two models when constructing NGRS and CRS: Model

General characteristics
The baseline characteristics of the participants were showed in Table 2.The age of girls in the case group and the control group were 7.72 ± 1.0 years and 7.60 ± 1.0 years, respectively, and there was no statistical difference between the two groups (t=-1.300,P=0.194).There were significant differences in children's weight, height, waist circumference, BMI, Z scores for BMI, MVPA, electronic screen time, sedentary time, the proportion of plastic bottled water, time on the roadside, bedtime at night, day sleeping time, night sleeping time, and total sleeping time (all P values <0.05).

Prediction model based on NGRS
Factors that were statistically correlated with EPD in univariate analyses as shown in Table 2 were incorporated into multivariate logistics regression using the stepwise method to select the potential influencing variables.Overweight (OR=2.74),longer electronic screen time (OR=1.79)and higher ratio of plastic bottled water (OR=1.01)were potential risk factors, and longer exercise time

Prediction model based on GRS
The associations of SNPs with EPD are presented in Table 1.The weighted GRS was computed by multiplying the number of risk alleles (0, 1, or 2) of each SNP by the natural logarithm of OR in the Logistic regression for that allele and summing across all SNPs.The mean gene count score was 5.00 (SD 0.83) in girls with EPD and The t-test was used to compare the continuous variables and the results were presented in mean ± SD.The c 2 test was used to compare the categorical variables and the results were presented as frequency and percentage (%).*Overweight was defined as BMI ≥ the 85 th percentiles for age-and sex-specific distribution according to WHO age-and sex-specific growth reference.

Reclassification of CRS
We used NRI to assess the extent to which adding GRS to the NGRS model resulted in the movement of prediction accuracy of Area under receiver operator characteristic curve (AROC) for early pubertal development.The non-genetic risk score (NGRS) was calculated from 6 biochemical predictors of independent risk (Model 1), and the genetic risk score (GRS) was constructed using 28 EPD related single-nucleotide polymorphisms (SNPs).EPD.In these analyses, we used the same three categories (< 10%, 10% -<90%, and ≥ 90%) and did the analyses separately for girls diagnosed as having EPD and those without EPD.As shown in Table 4, the addition of GRS to NGRS resulted in a NRI of 8.19% (95% CI: 1.11% -15.27%, P= 0.023) in the categorical analysis, and of 50.75% (95% CI: 31.15%-70.35%,P <0.001) in the continuous analysis.The IDI was also calculated to reflect the extent to which adding GRS to the NGRS model resulted in the movement of prediction accuracy of EPD, and it was 4.22% (95% CI: 2.20% -6.23%, P <0.001).

Discussion
Developing a prediction model for identifying individuals at risk is important to formulate measures for preventing or delaying disease onset.To our knowledge, there are no risk prediction models for EPD developed currently.The present study established a combined predictive model for EPD in girls based on both non-genetic and genetic factors.It also assessed the predictive ability improvement as well as the reclassification of adding genetic variants to the non-genetic risk model.
The timing of normal pubertal onset is influenced by both environmental and genetic factors.It is considered that overnutrition, insufficient exercise (20), insufficient sleep (6), as well as expose to some EDCs (7) are linked to EPD.In this study, we found that longer day sleeping time may be a protective factor for EPD.For adults, daytime nap seems to be beneficial to performance on certain cognitive tasks, and it has been suggested a modest causal association between habitual daytime napping and larger total brain volume using Mendelian randomization (21).For children, clinical observation showed that children who did not nap might be fussier and had shorter attention duration than children who napped regularly (22), but there is no research to date evaluating the association of napping with EPD.According to the Guide of Chinese Sleep Medical Society, sleeping time for children aged 6-8 should be 10-12 hours/day.In our study, night sleeping time of girls in both of the two groups was less than 10 hours/day.After adding day sleeping time, the total sleeping time of girls in the control group was 10.26 ± 0.7 hours/day, but the total sleeping time of girls in the case group was still insufficient (9.70 ± 1.0 hours/day).Although insufficient sleep was suggested a risk factor for EPD (6), the difference of total sleeping time between the two groups was not significant.Until now, the mechanism of the association of day sleeping time with EPD is not known, and future studies are needed to explore the related mechanism.
Currently, there are no risk prediction models for early pubertal development (EPD) developed worldwide.The present study confirmed that overweight, longer electronic screen time and higher ratio of plastic bottled water were potential risk factors, and longer exercise time and longer day sleeping time were protective factors for EPD.By constructing the NGRS as the sum of the above affecting factors, the NGRS performed well, with an AROC of 83.6%.
A family disease history, in specific cases, not only reflected genetic tendency but also reflected shared environmental and lifestyle factors, even including fetal programming.In fact, researches showed that a complete family history provided a better prediction than SNPs (23), and family history remained a strong, independent, and easily-to-assess risk factor for some diseases (24).The present study found that mother's first menarche age and father's first spermatogenesis age were   (6,7).However, such SNPs individually have low predictive ability for the risk of EPD.The GRS provides an opportunity to evaluate the cumulative effects of genetic factors.The present study showed a low effect (OR: 1.00-1.68) of per individual allele, but combining the markers could predicted greater risk (OR≈2).The GRS utilized herein had an AROC of 65.3%.
While the GRS had a lower predictive ability for EPD compared to NGRS, combining both factors prompted a modest increase (2.1% or 1.3% for models without or with family history related factors, respectively).Combining GRS and NGRS resulted in the movement of prediction accuracy of EPD, with NRI of 8.19% and IDI of 4.22%.Specific information for EPD predictive models remained scarce, so we cannot compare our results with others.When compared with type 2 diabetes prediction models, the results of our prediction models were reasonable.The predictive models for type 2 diabetes showed that inclusion of genetic biomarkers resulted in a slight improvement, with differences in AUC ranging from 0 to 12% and NRI from -2.2% to 10.2% (25).
Although the risk classification of EPD has been improved after adding GRS, the moderate impact needs to be considered, and there is not enough evidence to suggest that GRS should be included in clinical practice.However, genetic risk factors remain unchanged throughout the course of life, and the cost of genotyping is much higher than that of conventional risk factors.The conventional risk factors provide a nice predictive value of EPD risk classification, and in most cases, it can be easily obtained only by a medical history, physical examination and a questionnaire.Thus, we highlighted that conventional risk factors, such as family history related factors, overweight, adverse lifestyle, should be considered a priority into clinical practice.
One of the limitations of this study is that our results may not be extended to people of different ethnic backgrounds.Secondly, the cross-sectional design limited the ability to determine the influence of affecting factors on the progression of EPD over time.Thirdly, we could not confirm the relationship between dietary habits and EPD risk, since the nutritional self-management for girls in the case group was established at the time of or even before diagnosis.

Conclusion
In conclusion, the current study established a combined prediction model of EPD in girls.Adding genetic variants to the non-genetic risk model brought modest improvement.However, the non-genetic factors such as overweight and living habits have higher predictive utility.We emphasize that priority should be given to preventing environmental exposure over unchangeable genetic factors.

FIGURE 1
FIGURE 1 1, without mother's first menarche age or father's first spermatogenesis age; Model 2, with mother's first menarche age and father's first spermatogenesis age.The criterion of statistical significance was < 0.05 (for two-sided tests).All statistical analyses used IBM SPSS Statistics 24.0 (IBM SPSS, Chicago, IL) and R 4.0 (R Foundation for Statistical Computing, Vienna, Austria) software programs.

TABLE 1
List of the 28 single nucleotide polymorphism and their association with early puberty development.The top 15 SNPs from the GWAS conducted in East Asia factors for EPD.As shown inTable 3 (Model 1), the AROC of NGRS model was 83.6% (79.3-87.9%).After taking into account the influence of family history (Model 2), mother's first menarche age and father's first spermatogenesis age were inversely associated with EPD, and the AROC of NGRS model improved [86.1% (82.1-90.1%)].

TABLE 2
General information for the case and control groups.

TABLE 3
Odds Ratios (95% CIs) and area under receiver operating characteristics curve for genetic risk score, non-genetic risk score and combined risk score models.Model 1, without mother's first menarche age or father's first spermatogenesis age; Model 2, with mother's first menarche age and father's first spermatogenesis age.DeLong's test for the difference between AROCs of NGRS model and CRS model: *Z = -2.33,P=0.020; # Z = -1.78,P=0.074.'-' means that there was no value to display.

TABLE 4
NRI and IDI based on addition of GRS to NGRS, calculated using risk cutoffs of 10% and 90%.