Post-Operative Medium- and Long-Term Endocrine Outcomes in Patients with Non-Functioning Pituitary Adenomas—Machine Learning Analysis

Simple Summary Non-functioning pituitary adenomas (NFPAs) may present with hypopituitarism, and patients may develop or continue to exhibit hypopituitarism following surgery or radiotherapy to treat NFPAs. Panhypopituitarism is characterised as a deficiency in most or all pituitary hormones. Prediction of post-intervention hypopituitarism is currently limited, with no accurate models existing for medium- to long-term prognosis. The aim of this study was to develop machine learning (ML) models towards improving prediction of hypopituitarism up to 15 years following surgical intervention for NFPAs. Pre-operative hormone levels were shown to be the best predictor of panhypopituitarism up to 1 year post-operatively. Endocrine tests performed up to 1 year post-operatively were shown to support strong predictive models for assessing the probability of panhypopituitarism at 5 and 10 years post-operatively. Abstract Post-operative endocrine outcomes in patients with non-functioning pituitary adenoma (NFPA) are variable. The aim of this study was to use machine learning (ML) models to better predict medium- and long-term post-operative hypopituitarism in patients with NFPAs. We included data from 383 patients who underwent surgery with or without radiotherapy for NFPAs, with a follow-up period between 6 months and 15 years. ML models, including k-nearest neighbour (KNN), support vector machine (SVM), and decision tree models, showed a superior ability to predict panhypopituitarism compared with non-parametric statistical modelling (mean accuracy: 0.89; mean AUC-ROC: 0.79), with SVM achieving the highest performance (mean accuracy: 0.94; mean AUC-ROC: 0.88). Pre-operative endocrine function was the strongest feature for predicting panhypopituitarism within 1 year post-operatively, while endocrine outcomes at 1 year post-operatively supported strong predictions of panhypopituitarism at 5 and 10 years post-operatively. Other features found to contribute to panhypopituitarism prediction were age, volume of tumour, and the use of radiotherapy. In conclusion, our study demonstrates that ML models show potential in predicting post-operative panhypopituitarism in the medium and long term in patients with NFPM. Future work will include incorporating additional, more granular data, including imaging and operative video data, across multiple centres.


Introduction
Non-functioning pituitary adenomas (NFPAs) are well-differentiated tumours arising from adenohypophysis of the pituitary gland, accounting for up to 54% of all pituitary adenomas [1]. Clinical presentation of these tumours varies, due to their lack of hormonal secretion and clinical syndromes, from being incidentally detected on radiological studies performed for other conditions to hypopituitarism and visual deficits secondary to mass effect on the pituitary gland and optic apparatus, respectively [2][3][4].
Hypopituitarism is defined as a deficiency of one or more pituitary hormones. A substantial number of patients with NFPAs develop a degree of hypopituitarism due to mechanical compression of the pituitary gland and reduction of pituitary portal circulation [1,5]. In addition, treatment with surgery with and without adjuvant radiotherapy may increase the risk of developing further pituitary impairment [4,[6][7][8]. Currently, there are limited predictive models for post-operative endocrine outcomes in patients with NFPAs.
Previous studies have demonstrated the application of machine learning (ML) models in predicting individual pituitary hormone deficiencies immediately following transsphenoidal surgery for NFPAs [9]. The aim of this study was to extend this work by applying ML models to a larger cohort of patients with medium-and long-term follow-up.

Materials and Methods
Ethics approval to conduct this study was obtained from Westminster Research Ethics Committee on 7 April 2020. The Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) Statement was used in the preparation of this manuscript [10]. This study was a single-centre cohort study that included all patients who underwent surgical resection for NFPAs of more than 1 cm between 1987 and 2017 and had a follow-up duration of more than six months. The study was conducted at the National Hospital for Neurology and Neurosurgery based in London, United Kingdom. Surgical resection was performed mainly by three experienced neurosurgeons. A retrospective review of medical case notes was performed. Diagnosis of NFPAs was based on the absence of evidence of functioning tumours, e.g., hormone excess production and the expression of pituitary hormones in the resected tumours on immunohistochemistry. Data on patients' demographics, endocrine assessment, treatment modalities, and tumour recurrence and regrowth were collected. A full list of fields and descriptions can be found in Appendix A.
Patients with severe growth hormone (GH) deficiency on dynamic testing (peak GH response < 3 µg/dL) or those on GH-replacement therapy were defined as GH-deficient [11]. Patients with combined thyroid stimulating hormone (TSH), adrenocorticotropic hormone (ACTH), and gonadotropins deficiencies without a dynamic test were also considered deficient in GH [12]. Patients with ACTH and TSH deficiencies were reported as deficient when patients received glucocorticoid and thyroxine therapy, respectively. In addition, patients who had a morning cortisol level of less than 100 nmol/L and those with a suboptimal response to dynamic testing were recorded as ACTH-deficient [13][14][15]. Patients with a low free T4 level with an inappropriately normal or low TSH level were recorded as TSH-deficient. Patients with primary hypothyroidism were not considered to have TSH impairment. Gonadotrophin deficiency was defined as follows: men were deficient if they were receiving testosterone therapy or if early morning testosterone level was low in the presence of inappropriately normal or low serum gonadotrophins [16]. Premenopausal women with amenorrhoea and inappropriately normal or low serum gonadotrophins were recorded as having hypogonadotrophic hypogonadism [17], while women who received oestrogen replacement in the form of hormone replacement therapy or oestrogen-containing contraceptive pills were also considered to be deficient in gonadotrophins. Postmenopausal women with non-elevated serum gonadotrophins were also documented as gonadotrophin deficient. Panhypopituitarism was considered when there was a deficiency of GH, gonadotrophins, ACTH, and TSH hormones. Patients on desmopressin replacement therapy were considered to have central diabetes insipidus (AVP deficiency).
For initial analysis, connections between surgeries and hormone deficiencies were investigated to draw broad connections between tumour behaviour following intervention and endocrine outcomes. Due to the established impact of radiotherapy on hormone deficiencies [18], an effect supported by this dataset, surgical interventions were initially investigated against the last valid endocrine test following the intervention, but before radiotherapy was administered, where applicable. For radiotherapy interventions, the last valid endocrine test following radiotherapy was used.
In preparation for the application of ML algorithms, more sophisticated timelines were created for each patient for endocrine and radiological outcomes. These timelines were necessary in order to provide a chronology of endocrine and radiological findings granular enough to provide clinical insight and standardised for analysis across the full patient population. An example of the logic used is shown in Figure 1. Various timeline options were tested, with final results built around 6 month intervals, taken from the date of the first surgery. To reduce the risk of inaccurate classification for endocrine tests conducted on the same date as the surgical intervention, endocrine results were determined based upon the closest endocrine test to the mid-point of each period. Radiological outcomes were determined based on the closest radiological investigation to the end date of each period. For the purposes of visualising hormone deficiency populations, percentages were calculated over known endocrine profiles, with unknown results excluded.
For initial analysis, connections between surgeries and hormone deficiencies were investigated to draw broad connections between tumour behaviour following intervention and endocrine outcomes. Due to the established impact of radiotherapy on hormone deficiencies [18], an effect supported by this dataset, surgical interventions were initially investigated against the last valid endocrine test following the intervention, but before radiotherapy was administered, where applicable. For radiotherapy interventions, the last valid endocrine test following radiotherapy was used.
In preparation for the application of ML algorithms, more sophisticated timelines were created for each patient for endocrine and radiological outcomes. These timelines were necessary in order to provide a chronology of endocrine and radiological findings granular enough to provide clinical insight and standardised for analysis across the full patient population. An example of the logic used is shown in Figure 1. Various timeline options were tested, with final results built around 6 month intervals, taken from the date of the first surgery. To reduce the risk of inaccurate classification for endocrine tests conducted on the same date as the surgical intervention, endocrine results were determined based upon the closest endocrine test to the mid-point of each period. Radiological outcomes were determined based on the closest radiological investigation to the end date of each period. For the purposes of visualising hormone deficiency populations, percentages were calculated over known endocrine profiles, with unknown results excluded. For the application of ML algorithms, the endocrine and radiological outcome timelines were then filtered to focus the analysis. To maximise clinical relevance, hormone periods were restricted to the first 5 years following the first surgery. ML algorithms were applied to each subsequent radiological period, up to a maximum of 15 years. All hormone deficiencies were investigated, including the compound conditions for panhypopituitarism, and panhypopituitarism combined with AVP deficiency. Again, for clinical purposes, radiological outcomes were filtered to remove secondary outcomes (e.g., optic contact/compression, cavernous invasion, etc.). Finally, supplementary features within the For the application of ML algorithms, the endocrine and radiological outcome timelines were then filtered to focus the analysis. To maximise clinical relevance, hormone periods were restricted to the first 5 years following the first surgery. ML algorithms were applied to each subsequent radiological period, up to a maximum of 15 years. All hormone deficiencies were investigated, including the compound conditions for panhypopituitarism, and panhypopituitarism combined with AVP deficiency. Again, for clinical purposes, radiological outcomes were filtered to remove secondary outcomes (e.g., optic contact/compression, cavernous invasion, etc.). Finally, supplementary features within the source data then had the option of being added to the ML dataset, normalising where appropriate (age, tumour volume). Any features with only two possible values (e.g., sex), were reduced to a single Boolean feature. Features with multiple results (e.g., Second Radiology Scan), were converted to two features for each possible result (e.g., Second Scan Complete Resection: True, Second Scan Compete Resection: False). This ensured that potential missing data points (i.e., features not recorded), did not add bias in the ML models, allowing the developed algorithms to select for both the existence of a result and the absence of the same result in determining the best predictive model. ML classification algorithms were applied with the aim of predicting post-operative panhypopituitarism up to 15 years following first surgery. Algorithms tested were simple logistic regression, k-nearest neighbour (KNN), support vector machine (SVM), and decision trees. These algorithms were selected due to their comparative strength in classification models over small datasets [19]. Due to the relatively small sample size, training data were prepared using 80% of patients, chosen at random. Model tests were then conducted against the remaining 20% forming the test dataset, and performance was captured against accuracy, F1, precision, recall, and area under the receiver operating characteristic curve (AUC-ROC). In each case, a confusion matrix was also generated.
To account for potential model instability, each algorithm was run 20 times, varying the random seed each time, with all associated metrics assessed as the mean and standard deviation of the resulting output. Results of the algorithms were assessed against the best predictive capability for panhypopituitarism at 6 months, 1 year, 5 years and 10 years post-operatively. For the decision tree algorithm, results were captured for feature importance, using the GINI index, allowing extraction of the most influential features for each outcome period.

Population Statistics
Simple population statistics for the patient cohort are shown in Table 1. Pre-operatively, 235 patients (61%) had evidence of at least one pituitary deficiency. Anterior panhypopituitarism was more common in men (p = 0.001) and older patients (p = 0.005).

Hormone Deficiency by Intervention
A heatmap of the percentage of patients exhibiting hormone deficiencies following each intervention is shown in Figure 2. Figure 3 shows the hormone deficiency rates as a function of time for the derived timelines, with specific rates shown at time of surgery and 6 months, 1 year, 5 years, and 10 years post-operatively, selected for clinical significance, included as Table A2 in Appendix A. Ten patients of those who underwent multiple surgeries without radiotherapy (number 46) had panhypopituitarism before secondary surgery; the rate rose to 12 patients at latest follow-up. In those treated with a combination of surgery and radiotherapy (number 65), 20 patients had panhypopituitarism after primary surgery; the incidence increased to 37 patients at latest clinical review.  Figure 3 shows the hormone deficiency rates as a function of time for the timelines, with specific rates shown at time of surgery and 6 months, 1 year, 5 yea 10 years post-operatively, selected for clinical significance, included as Table A2 pendix A. Ten patients of those who underwent multiple surgeries without radiot (number 46) had panhypopituitarism before secondary surgery; the rate rose to tients at latest follow-up. In those treated with a combination of surgery and radiot (number 65), 20 patients had panhypopituitarism after primary surgery; the incide creased to 37 patients at latest clinical review.    Figure 3 shows the hormone deficiency rates as a function of time for the derived timelines, with specific rates shown at time of surgery and 6 months, 1 year, 5 years, and 10 years post-operatively, selected for clinical significance, included as Table A2 in Appendix A. Ten patients of those who underwent multiple surgeries without radiotherapy (number 46) had panhypopituitarism before secondary surgery; the rate rose to 12 patients at latest follow-up. In those treated with a combination of surgery and radiotherapy (number 65), 20 patients had panhypopituitarism after primary surgery; the incidence increased to 37 patients at latest clinical review.

Machine Learning for Determining Post-Operative Panhypopituitarism
Model accuracy and AUC-ROC comparison between the selected algorithms is shown in Table 2 as mean and standard deviation across all prediction periods. With minimal tuning, SVM and decision tree models showed superior accuracy and AUC-ROC scores. For ease of extraction of feature importance, the decision tree model was further

Machine Learning for Determining Post-Operative Panhypopituitarism
Model accuracy and AUC-ROC comparison between the selected algorithms is shown in Table 2 as mean and standard deviation across all prediction periods. With minimal tuning, SVM and decision tree models showed superior accuracy and AUC-ROC scores. For ease of extraction of feature importance, the decision tree model was further analysed. For predictions up to 1 year post-operatively, pre-operative endocrine tests were included in the analysis for feature importance. For predictions at 5 and 10 years post-operatively, the most accurate early endocrine test period (up to one year following first surgery) was adopted for the analysis. The performance of the models when including the relative early endocrine test periods are shown in Table 3, with the feature importance for each future period shown in Table 4, in order of most to least important, excluding features with importance of less than 0.1, or where the standard deviation was equal to or greater than the feature importance.

Discussion
There is no general agreement on predictors of hypopituitarism following therapy. Patients receiving surgery with and without radiotherapy for pituitary adenomas require long-term clinical follow-up and biochemical evaluation to identify pituitary impairment and instigate hormone replacement when appropriate [20]. In this study, we assessed several patients and tumour characteristics using ML models to predict the occurrence of pituitary insufficiency in a large cohort of patients with NFPAs following transsphenoidal surgery and post-operative radiotherapy. The principal finding of this study is that the likelihood of developing hypopituitarism was strongly influenced by the presence of pre-operative pituitary hypofunction.
Hypopituitarism refers to partial or complete loss of hormones produced by adenohypophysis or neurohypophysis that can be caused by several aetiologies causing damage in the hypothalamic-pituitary region. Pituitary adenomas and treatment with surgery and radiotherapy are considered the most common causes of hypopituitarism [6,20,21]. Clinical symptoms and presentation can be variable, from subclinical disease to life-threatening emergencies, dependent on the degree of hormone deficiency. At presentation, pituitary adenomas of more than 1 cm are frequently associated with pituitary hormone deficiency resulting from mass effect on the normal anterior pituitary and the pituitary stalk, preventing the stimulation of pituitary cells by the hypothalamus with a prevalence of up to 85% of cases [1]. The process of developing hormone deficiency can be insidious, slowly evolving, and delay in diagnosis is often common. Long-term hormone replacement is often required to improve quality of life and reduce morbidity and mortality, with lifelong follow-up recommended [22,23].
Transsphenoidal hypophysectomy remains the gold standard for treating large NF-PAs [24,25], particularly when there is evidence of a pressure effect on the optic apparatus. The pituitary function can fluctuate following surgery and the risk of developing endocrine dysfunction post-operatively has been linked to many variables, including the size of the pituitary adenoma, the degree of surgical resection and manipulation, and dealing with recurrent disease [26]. The risk of deterioration in pituitary function ranges 0-36%, while recovery of deficient pituitary hormones varies 10-98% [6]. Hypothalamic pituitary dysfunction is one of the common late effects of pituitary radiotherapy. The occurrence of hypopituitarism following irradiation is progressive, and the severity of hormone deficits depends on many factors, including radiation dose, fraction size, the interval between fractions, and the duration of follow-up [8,27]. The exact mechanism is not very well understood; radiation-induced hypothalamic neural damage and secondary pituitary atrophy due to the loss of hypothalamic releasing and stimulating neurotransmitters are proposed to be contributing factors [28,29].
Panhypopituitarism, characterised here as simultaneous deficiencies across GH, gonadotrophins, ACTH and TSH, was present in 52 patients prior to surgery (13.6% of the total population; 15.8% of known endocrine profiles), and rose to 55 patients in the period immediately following surgery (14.4% of the population). This figure decreased marginally over the period 6-24 months after surgery, to a minimum of 49 patients, before steadily rising to 57 patients 10 years after initial surgery.
ML models showed strong performance in predictions of panhypopituitarism up to 15 years following initial surgery, with the strongest-performing models being SVM (mean accuracy: 0.94; mean AUC-ROC: 0.88) and decision tree (mean accuracy: 0.92; mean AUC-ROC: 0.84). Both SVM and decision tree models exhibited stronger predictive capability than standard logistic regression (mean accuracy: 0.89; mean AUC-ROC: 0.79).
The decision tree model showed strong performance in predicting panhypopituitarism up to 1 year following initial surgery, based upon endocrine profiles taken pre-operatively. On analysis of feature importance within the model, only pre-operative presentation of panhypopituitarism showed a strong contribution (feature importance of pre-operative panhypopituitarism presence/absence: 0.40/0.32 and 0.39/0.33 when predicting panhypopituitarism at 6 months post-operative and 1 year post-operative, respectively), with weaker contributions from patient age and tumour volume. The same model showed strong performance in the prediction of panhypopituitarism at 5 and 10 years following initial surgery, based upon endocrine profiles taken 1 year post-operatively. For these periods, only the prior presentation of panhypopituitarism (with or without AVP deficiency) contributed strongly to the model (feature importance of 1 year post-operative panhypopituitarism presence: 0.76 and 0.71 when predicting panhypopituitarism at 5 years post-operative and 10 years post-operative, respectively), with weak contribution from follow-up intervention with radiotherapy.
A recent study by Fang et al. [9], the only study that assessed pituitary outcomes using ML models after surgery, reported that pre-operative adenohypophysial endocrine impairment was associated with a higher risk of developing hormone deficiency postoperatively. Our study findings are in line with Fang et al.'s conclusion and extend ML experience of anticipating hypopituitarism in a large cohort of patients, of both genders, with NFPAs and a long follow-up duration. A limitation of the study concerns the inconsistent timing of endocrine assessments across the cohort and the resultant need to create standardised timelines algorithmically. Furthermore, the limitation of retrospective data collection applies.

Conclusions
ML models have been shown to provide stronger predictive performance than standard statistical methods when assessing patients' prognosis for hypopituitarism following surgical intervention to treat NFPAs. Investigation of a feature-rich patient dataset (383 cases), including population demographics, endocrine assessment, treatment modalities and results, and radiology scan results, indicated prior presentation of panhypopituitarism as the strongest predictor of post-operative panhypopituitarism. Minor influence was observed from the patient's age, tumour volume, and post-operative treatment with radiotherapy. ML models can be employed to predict endocrine outcomes in pituitary adenomas as demonstrated in this study. Further research is needed to develop accurate and reliable ML algorithms to aid in identifying prognostic markers in pituitary adenomas. Patients with pre-operative pituitary impairment in NFPAs are at a higher risk of persistent hypopituitarism following surgery or in combination with radiotherapy. This cohort of patients requires long-term endocrine monitoring to optimise hormone replacement and overall outcome. Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study will be made publicly available, upon the paper's acceptance, in the University College London institutional research repository (https://rdr.ucl.ac.uk/ (accessed on 15 May 2023)).

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.