A proof of concept of a machine learning algorithm to predict late-onset 21-hydroxylase deficiency in children with premature pubic hair

In children with premature pubarche (PP), late onset 21-hydroxylase deficiency (21-OHD), also known as non-classical congenital adrenal hyperplasia (NCCAH), can be routinely ruled out by an adrenocorticotropic hormone (ACTH) test. Using liquid chromatography-tandem mass spectrometry (LC-MS/MS), a quantitative assay of the circulating steroidome can be obtained from a single blood sample. We hypothesized that, by applying multivariate machine learning (ML) models to basal steroid profiles and clinical parameters of 97 patients, we could distinguish children with PP from those with NCCAH, without the need for ACTH testing. Every child presenting with PP at the Trousseau Pediatric Endocrinology Unit between 2016 and 2018 had a basal and stimulated steroidome. Patients with central precocious puberty were excluded. The first set of patients (year 1, training set, n=58), including 8 children with NCCAH verified by ACTH test and genetic analysis, was used to train the model. Subsequently, a validation set of an additional set of patients (year 2, n=39 with 5 NCCAH) was obtained to validate our model. We designed a score based on an ML approach (orthogonal partial least squares discriminant analysis). A metabolic footprint was assigned for each patient using clinical data, bone age, and adrenal steroid levels recorded by LC-MS/MS. Supervised multivariate analysis of the training set (year 1) and validation set (year 2) was used to validate our score. Based on selected variables, the prediction score was accurate (100%) at differentiating premature pubarche from late onset 21-OHD patients. The most significant variables were 21-deoxycorticosterone, 17-hydroxyprogesterone, and 21-deoxycortisol steroids. We proposed a new test that has excellent sensitivity and specificity for the diagnosis of NCCAH, due to an ML approach.

phenotype of NCCAH is determined by the less severe mutation of CYP21A2 [9]. However, molecular diagnosis is mandatory for familial planning [10].
Based on the current consensus [11], adrenocorticotropic hormone (ACTH) testing is performed on a routine basis for all children with PP. NCCAH is currently defined by a post-ACTH test level of 17-hydroxyprogesterone  in the serum greater than 10 ng/ml [12]. After parental consent (according to the Declaration of Helsinki), these results must be confirmed by a genetic test of the CYP21A2 gene. The use of mass spectrometry for multidimensional steroid profiling in 21-OHD is well established, starting with use of gas chromatography-mass spectrometry in the 1980 s where urinary steroid profiles were useful, especially in newborns [13,14]. Furthermore, with the development of liquid chromatography-tandem mass spectrometry methods (LC-MS/MS), the diagnosis of adrenal diseases [15,16] and especially of CAH [17,18] has improved specificity for steroid analysis when compared with immunoassays [19], because these methods are free from cross reactions of 17-OHP with others steroids [20] and can reduce false positive results. Spectrometry methods can also reduce the number of unnecessary tests and ease the anxiety of the patients [21]. With LC-MS/MS, a circulating steroidome quantitative assay [22], including more than 16 molecular species, is now available in a single analysis.
We hypothesized that combining patient anthropometric data with the analytical performances of mass spectrometry and the use of multivariate statistical approaches, such as machine learning algorithms, could lead to the development of a new practical diagnostic tool for clinical routine analysis. The aim of the present study was to generate a score based on multivariate training to easily and accurately distinguish children with PP from those with NCCAH. Routine ACTH tests would therefore be unnecessary and limited to ruling out adrenal insufficiency (AI).

Patients
The exploration of leftover samples of serum of all children who had routine ACTH tests to screen for NCCAH at the Trousseau Pediatric Endocrinology Unit from June 2016 to July 2018 was extended to 16 steroid profile analyses. All patients came to the endocrine unit because of premature pubic hair, to rule out either adrenal disease or precocious puberty. Only children with early pubic hair (before 8 years in girls, and before 9 years in boys) were included in the study. Children with central precocious puberty (CPP) (> Tanner stage 1 associated with an increase in luteizing hormone LH > 5 after gonadotropin-releasing hormone (GnRH) testing) were excluded. One younger patient (1 year and 3 months old) without premature pubic hair was included because of high basal 17-OHP levels in an unrelated context of cryptorchidism; one patient was older than the others (12 years and 10 months old) and had been tested because of Prader-Willi syndrome-associated PP [23].
The first set of patients (year 1, n = 58), recruited from June 2016 to July 2017, was used to train the model. Subsequently, a validation set (year 2) of an additional group of patients (n = 39) was obtained to validate our model (Fig. 1). The latter had the same inclusion and exclusion criteria as patients in the training set. The inclusion of patients in the validation set started in November 2017 and ended in July 2018. The study was conducted in accordance with the Declaration of Helsinki.
The model was first trained and tuned (feature selection) on a training set of children (n = 63 with exclusion of 5 boys with CPP) from a pediatric endocrinology unit (year 1) and internally evaluated using cross-validation. A validation set of children (year 2), recruited consecutively to the training set from the same clinical institution, was used for external evaluation (n = 39). The results were matched and labeled according to the expected diagnosis output.

ACTH test
All patients underwent ACTH testing at 8:30 AM, after a night of fasting. Physical examination was performed by a pediatric endocrinologist. The data collected included height in standard deviation score (SDS), growth velocity in SDS, body mass index (BMI) in Z-score, pubertal status as defined by Tanner, and bone age (BA) increment (BA− chronological age). Growth curves were standardized according to Sempé [24]. BMI was analyzed according to Rolland-Cachera standards [25]. The BA was determined according to the method of Greulich and Pyle [26]. When the patients had breast development or increased testicular volumes, they also had a GnRH test. Steroids were assayed at baseline, then stimulated by ACTH and assayed by LC-MS/MS as Fig. 1. Data flow diagram for the study (17-OHP > 10 ng/ml corresponds to the 17-OHP stimulated concentration).
previously described [22]. Informed consent for CYP21A2 gene analysis was obtained from the patients themselves and their parents when 17-OHP levels were above 10 ng/ml, as approved by our local ethics committee. Following the criteria defined by Kuttenn et al. [12], patients with 17-OHP concentrations above 10 ng/ml after ACTH test and bearing abnormalities in CYP21A2 gene on different alleles were defined as NCCAH. In our cohort, all NCCAH patients presented a 17-OHP concentration above 22 ng/ml after ACTH testing. All PP patients exhibited a 17-OHP concentration below 5 ng/ml after ACTH test (except one heterozygous patient with a 9 ng/ml peak 17-OHP).
Briefly, a mixture of the deuterated internal standard (150 µL) was added to 50-100 µL of serum. The solution was mixed and left standing for 5 min, then loaded into an Isolute SLE + 0.4 ml cartridge (Biotage, Uppsala, Sweden). The samples were allowed to adsorb for 5 min before elution of the steroids through the addition of 2 × 0.9 ml methylene chloride. The eluates, which contained the non-conjugated steroids, were evaporated until dry and reconstituted to 150 µL in methanol/ water (50/50, volume-to-volume ratio). Steroids were chromatographically separated by high-performance liquid chromatography using a Shimadzu Nexera XR system (Shimazu France, Marne la Vallee, France) and a Coreshell C18 column (Kinetex, 2.6 µm 100 Å 100 × 2.1 mm; Phenomenex, Le Pecq, France). Detection was performed using a triple quadrupole mass spectrometer (Triple Quad 6500, ABSciex, Foster City, CA). Upon collection, the LC-MS/MS data were analyzed using Multi-Quant software (ABSciex, Foster City, CA, version 3.0) with built-in queries and quality control rules that allowed for compound-specific criteria for flagging outlier results. These flagging criteria included accuracies for standards and quality control, quantifier ion/qualifier ion ratios, and lower/upper calculated concentration limits. For each calibration curve, the regression line used for quantification was calculated using least-squares weighting (1/x).

Data processing and statistical analysis
Biological and clinical data were combined and subjected to multivariate analyses and were compared between the NCCAH children and the other children (referred to as the PP group) from the training set. Univariate analysis was first performed using Mann-Whitney testing according to the number of observations (statistical significance threshold p < 0.05).
The raw data were then loaded into SIMCA 15 software (version 15, Umetrics, Västerbotten, Sweden), using principal component analysis (PCA) and orthogonal partial least square discriminant analysis (OPLS-DA) after standardization (removing the mean and scaling to unit variance). A metabolic footprint was assigned for each patient using clinical data and the baseline adrenal steroid levels were recorded by the LC-MS/MS. The goal was to determine whether basal levels can be as accurate as an ACTH test in predicting NCCAH.

Unsupervised multivariate analysis of the training set
The purpose of this first step was descriptive; the goal was to provide an overview of the distribution of the two groups of patients and the variability of the system. PCA, an unsupervised method (dimension reduction), was applied to the data to observe the possible presence of trends and groupings, which were previously not obvious by observing the raw data. PCA allowed the detection of potential outliers as well. PCA is used to preprocess and reduce the dimensionality of datasets, while preserving the original structure and relationships inherent to the dataset. The resulting data were displayed as score plots representing the distribution of the samples in multivariate space.

Supervised multivariate analysis of the training set
The objective of this next step was to determine if the distribution was significantly different between the NCCAH patients and the PP patients of the training set. A discriminant analysis of variables was performed. The assignment of patients in each group (NCCAH and PP) was made a priori for the construction of the model. OPLS-DA, as a supervised and useful multivariate ml algorithm [27], was applied to the training set. Based on recent studies, the OPLS approach was considered particularly appropriate, given the input data type (multicollinearity) and size of the present study [28].
The developed model can be evaluated using several parameters, such as goodness of fit (R 2 ), a test of permutation, and the capability to predict (Q 2 ). Eventually, owing to the validation set, we were able to build a predictive classification score. In order to determine the relevance of the variables and to rank their predictive capacity in the model, we used variable importance in projection (VIP) [29].

Patient's characteristics from the training and validation set
All patients visited the endocrine unit because of precocious pubic hair to exclude adrenal disease or precocious puberty. A total of 97 patients (81 girls and 16 boys) were included in our study -58 patients in the training set and 39 in the validation set ( Fig. 1).
All patients were in Tanner stage 1 (prepubertal stage) ( Table 1). Thirteen patients were diagnosed with NCCAH by 21-OHD, confirmed by CYP21A2 genetic analysis: 8 patients from the training set and 5 patients from the validation set (Table 2). All these patients presented with a V281L mutation; 54% had an additional severe mutation or deletion. 12 of the 13 had partial AI, defined as a post-ACTH cortisol peak < 180 ng/ml. Basal levels of 17-OHP ranged from 3.62 ng/ml to 87.94 ng/ml. Peak 17-OHP levels after ACTH testing ranged from 22.73 ng/ml to 92.57 ng/ml (Table 2).
A total of 58 patients (45 girls and 13 boys) with a mean age of 7.14 years (+/− 2.32) were included in our training set (Fig. 1). Five boys had an increase in testicular volume (> G1) with peak LH > 5 after GnRH test and were not included in the study. Nine patients after ACTH testing had a peak 17-OHP level > 9 ng/ml. CYP21A2 gene analysis was performed on these patients. A total of eight patients were diagnosed with homozygosity for NCCAH and a diagnosis of heterozygosity for 21hydroxylase was made in one girl, an 8-year-old with pubic hair appearance at the age of 7 years and 3 months. she had an accelerated Table 1 Clinical characteristics (means ± SD) of patients from both training and validation sets. growth velocity (10 cm in one year, 5.4 SDS) and a two-year BA advance; 17-OHP basal level was 0.57 ng/ml with a peak at 9.06 ng/ml. Sequencing of the CYP21A2 gene found a moderate mutation p.H38L in a heterozygous state. Table 3 compares the basal biological characteristics of the children in the training set. The difference between NCCAH and PP children was significant (p < 0.05) for A4, T, 17-OHP, 21-DB, 21-DF, PREG, 11OHA4, and P.
There was no overlap for the 17-OHP and 21-DB values between the two groups in both sets, with significantly increased 17-OHP, 21-DF, and 21-DB levels in the NCCAH children (Fig. 2).
Unsupervised multivariate analysis of the training set was first completed. The descriptive analysis showed a good separation between the NCCAH group and the PP group (R2X = 0.350). The variables with poor discrimination were removed from the study, which were growth velocity and BMI.

Model construction
Secondly, a score plot was obtained from a supervised multivariate analysis of the training set using OPLS-DA (Fig. 3A). The observations were projected in the system's maximum variability plan (R2X = 0.900 and Q2X = 0.868) on two components. NCCAH patients were segregated on the left side of the score plot (red dots) while PP patients were segregated on the right quadrants (green dots). There was a good separation of the groups without any overlap. The heterozygous patient was segregated in the PP group.
To build a prediction score, the contribution of each variable of our system was calculated (Supplementary Figure 1). The higher the absolute value of the coefficient of a variable, the more discriminatory and important the variable was for prediction and group separation. The coefficients were calculated using the algorithm established by the OPLS-DA regression. As shown in Supplementary Figure 1, the variables 21-DB, 17-OHP, 21-DF, 11OHA4, and P were the top 5 highest contribution coefficients for NCCAH group discrimination followed by BA, the only discriminating clinical factor with a VIP score > 1.

Validation of the model with the replication cohort
The capability to predict the model was good according to internal cross validation QX2 = 0.868. A validation set of 39 patients (36 girls and 3 boys) was used to assess a second step validation. The average age was 7.50 years (+/− 1.46), and no patient was excluded. Five patients had a peak 17-OHP level > 10 ng/ml and were diagnosed as NCCAH after CYP21A2 gene analysis (see Table 1, patients 9-13).
To note, by definition, the data from the validation set were not used to refit the model; they were only used to test its accuracy. To mimic future prospective routine diagnosis, observations from the validation Table 2 Basal and peak (after ACTH test) steroid concentrations (in ng/ml) of 17-OHP, 21-DF, F, and genotypes of NCCAH patients. Patients 1-8 belonged to the training set and patients 9-13 to the validation set.   set were first labelled with black stars, as no a priori class attribution was achieved (Fig. 3B). Eventually, as seen in Fig. 3C, to evaluate the true model performances, the black stars corresponding to children from the validation set were labeled according to their retrospective final diagnosis: NCCAH (n = 5, red stars) and PP (n = 34 green stars). Five patients of the validation set were segregated in the same area of the score plot as the NCCAH patients based on the training set. These 5 patients were eventually a posteriori genotyped as NCCAH. The misclassification table (Table 4) summarizes the prediction results of the model. The first column displays the total number of observations in the training set and the validation set, the average percentage correctly classified, and the number of observations classified to each class. As mentioned in the No class column, the model never classified any of the patients to other classes than those available (i.e., YPred score value < 0). Fisher's probability is the probability of the table occurring by chance and is satisfied when p < 0.05 for 95% confidence. Here, the success rate for the prediction of the membership class was 100% with either no false positive or false negative scores. The five NCCAH patients in the validation set (post-hoc confirmation by CYP21A2 analysis) were all assigned to the NCCAH group in our model. Therefore, thanks to the OPLS model integrating steroid fingerprint and clinical data, the classification of NCCAH patients and PP patients was optimum with a validation accuracy of 100%.

Discussion
The objective of the present study was to develop, based on a mathematical ML model, a score including basal circulating steroid profile and clinical data to avoid the use of an ACTH testing in the differential diagnosis of NCCAH in a pediatric population.
Indeed, ACTH tests are useful in detecting AI in children with NCCAH. In our study, similar to other pediatric studies [30][31][32], a high number of NCCAH patients presented a cortisol peak below 180 ng/ml (12 NCCAH patients out of 13). These data underlined the importance of glucocorticoid stress dose in children undergoing stressful situations like surgery or illness.
Our findings uncovered clinical and biological differences between children with premature pubic hair and NCCAH with significant results for BA and basal levels of 21-DB, 17-OHP, 21-DF, A4, T, Preg, and 11OHA4. There were no overlaps for 21-DB and 21-DF (concentration threshold = 12.42 pg/ml), or for 17-OHP (concentration threshold = 3.62 ng/ml). It was consistent with previous studies, in particular for 21-deoxysteroids [19,20,22]. Indeed, 21-DF and 21-DB are strictly adrenal metabolites and therefore represent potentially more specific biomarkers for NCCAH diagnoses than 17-OHP, which has a dual ovarian and adrenal origin [22,43,44]. Some authors have already proposed a threshold of basal 17-OHP [34], but this remains controversial [32,33]. With the validated mass spectrometry approach (LC-MS/MS), we were able to routinely quantify simultaneously in 10 min 16 circulating steroids via a direct measurement method with excellent sensitivity and specificity (certified by the French national quality control organization). We can thereby attribute a specific steroidome, hormonal signature, or "fingerprint" to each patient [22].
In our patients, basal measurements of 17-OHP were surprisingly  Table 4 Model prediction results. Assignment in each class (NCCAH or PP) of the patients from the training set and the validation set. The predictive results obtained on the training set were obtained using cross-validation. a diagnosis made with post-hoc confirmation by CYP21A2 analysis elevated, and could already distinguish NCCAH patients from PP patients in univariate analysis. This was not always the case for our patients, for whom a threshold value of 17-OHP was always discussed, presumably because our pediatric patients had more severe features than late onset adrenal hyperplasia diagnosed in adulthood and because they were bearing a severe mutation in 54% of the cases. However, the use of 17-OHP which characterizes 21-OHD is not without criticism [35]. False negative rates of up to 22% have been reported in infant screening [36], particularly when mothers were exposed to glucocorticoids prenatally [37]. False positives in some premature infants or in other forms of CAH including 11-hydroxylase deficiency were also reported [38]. Moreover a single basal 17-OHP concentration is often insufficient to diagnose non-classic 21-OHD carriers [39]. The use of mass spectrometry gives a more specific view of adrenal steroid levels in 21-OHD compared with immunoassays [40]. Furthermore, in terms of cost, according to our own experience in our clinical laboratory, a panel of about 16 steroids can be more advantageous than relying on specific expensive immunoassay kits. This method is more specific and more sensitive than radioimmunoassay (RIA) because it allows the quantifying of steroid profiles including 21-DF within 150 µL of serum. It also allows the measurement of other steroids such as 21-DB, which is particularly informative for CAH. To our knowledge, 21-DB has not been extensively studied until now. According to our previous data, plasma basal 21-DB concentrations measured by RIA [41], and more recently in LC-MS/MS [22], could represent an interesting additional biomarker to identify patients with NCCAH. Although 17-OHP is usually quantified on its own, the addition and combination of new strictly adrenal steroids, such as 21-DF and 21-DB, could enhance the specificity for the diagnosis of NCCAH. Miller et al. even suggest replacing 17-OHP with 21-DF in the newborn screening program [35].

Number of Patients
With the ability to routinely and simultaneously quantify 16 circulating steroids, we highlighted the use of some steroids not often used in practice (such as 21-DB with no overlap) and other better-known steroids (like 21-DF). 21-DF has already been evaluated in children with 21-OHD, and was deemed an excellent specific marker of this disease [20]. 21-DF is useful in the diagnosis and the follow-up of NCCAH and for the detection of heterozygotes [42].
Determining the status of heterozygotes and NCCAH is very important, especially in those carrying severe alleles, for genetic counseling to anticipate the risk of AI and genital ambiguity at birth [43]. In our study, 54% of NCCAH children had a severe mutation. It is further essential to explain the screening to the family's proband and the future partner in family planning [44,45].
A non-classical form of mutation (M283V of the NCCAH patient no. 10, Table 1) has been described in just one case of a patient with NCCAH [46]. Notably, all of our NCCAH patients had the V281L mutation, the most common mutation in patients with NCCAH [47]. Three had the IVS2-13 C > G allele, the most frequent mutation in classical forms [9].
Our findings demonstrated a difference in clinical features and advanced BA, as is already known in the literature [7]. Even using subjective methods, it is easy to perform routinely in clinical settings [26]. In our study, all children with NCCAH demonstrated advanced BA and six had advanced BA of more than two years. Some reports propose glucocorticoid therapy for these patients to avoid a short final stature [11]. In combination with biological measurements, BA was an important part of our design score for the diagnosis of NCCAH, suggesting that an advanced BA may be the only difference between clinical features of NCCAH patients and PP [34]. We limited our study to prepubertal children from 7 to 9 years old (mean age 7.14 years (+/− 2.32)) to focus on adrenal steroids responsible for adrenarche and to free ourselves from sex variations induced by gonadal puberty. Analysis of steroids only secreted by the adrenal gland could be a further area of study [47].
The model we developed predicted with 100% sensitivity and specificity the diagnoses of NCCAH in a population of children with premature pubic hair (n = 97), and could probably be extended to older populations with mild features. The constitutions of the training set (first year) and the validation set (second year) led us to evaluate the robustness of our model more accurately than with a simple internal cross-validation. Multivariate data analysis [48] and ML methods have already been evaluated in adrenal diseases and can shorten diagnosis in some cases [49,50]. In this study, OPLS-DA analysis was chosen as the ML algorithm since our dataset included many steroids with potentially high collinearity. We trained and evaluated 5 other ML models (support vector machine, nearest neighbors classifier, decision tree, random forest, and Gaussian Naive Bayes) which confirmed the discriminative power of the data, independently of the chosen method (not shown). However, their performances were either worse or equivalent with OPLS-DA. Moreover, this approach gave us the opportunity to present the data on 2D score plots, creating a bird's eye view to summarize and study the degree of similarity (or dissimilarity) of the patient's bio-clinical signatures.
In future, multivariate mathematical approaches such as these, which can integrate anthropometric data (gender, age, BMI, etc.), biological measurements, and even radiological images, could generate a unique fingerprint for each patient and offer new potentialities in the future diagnosis toolbox. Model training iteration processes could eventually be performed [51,52]. Indeed, with the selected features, additional patients could be included to build refined models over time. The proposed model could therefore find its place in the care management pathway of children with premature pubarche and late onset 21-OHD. The method may be extended to higher ages; this work is in progress with adult patients from non-pediatric endocrinology units, considering age and sex references for adult patients. Prospective interventional studies will be needed to validate this model for clinical use [50].

Conclusion
To conclude, LC-MS/MS enabled assignment of a metabolic fingerprint to each patient. Combining this with a statistical model allowed the construction of a NCCAH diagnostic score with 100% sensitivity and specificity in our cohort. The most significant variables were 21-DB, 17-OHP, and 21-DF. If this score was routinely implemented, use of the ACTH test could be restricted to screening patients for AI, which, based on the prevalence of NCCAH, would be nearly one in ten children. Genetic analysis of CYP21A2 would remain essential in NCCAH patients to identify severe mutations. Further studies are needed for the validation of this score, particularly with the use of a larger multicenter prospective cohort.