Predicting The Risk and Timing of Bipolar Disorder In Offspring of Bipolar Parents: Exploring The Utility of a Neural Network Approach

Background we as the Partial Logistic Articial Neural Network (PLANN) to predict the time to diagnosis of bipolar spectrum disorders


Background
Bipolar disorder affects an estimated 2.5% of the population, with higher prevalence for spectrum conditions 1 . The onset peaks in late adolescence and early adulthood 2 , however, delayed recognition and misdiagnosis remains a challenge. Untreated illness is associated with substantial morbidity and mortality early in the course 3 , and therefore timely and accurate diagnosis of bipolar disorder is critical to facilitate prompt treatment. Bipolar disorder runs in families, and therefore the children of bipolar parents are an identi able high-risk group ideally suited for risk prediction studies 4 . Family studies have shown that the bipolar trait segregating in families includes major depressive disorder, bipolar I, II and schizoaffective bipolar disorder 5 . The penetrance and spectrum varies between families and according to the subtype of illness. Furthermore, longitudinal prospective studies of high-risk offspring have provided strong evidence that the illness often debuts with depressive episodes 6 .
While key risk factors for the development of bipolar disorder have been identi ed such as characteristics of the parental age of onset and clinical course, early adversity, and antecedent clinically signi cant symptoms 7,8 , translatable risk prediction tools for clinicians do not exist or are in the early stages of development.
Given the heterogeneity in age of onset, it is imperative to use survival models rather than logistic regression or classi cation methods. Time-varying covariates are exposure variables that can vary with time across individuals, such as level of anxiety, or antecedent symptoms Given the importance of antecedent risk factors contributing to the risk of bipolar disorder together with the variable age of onset, it is important to use methods which accommodate time-varying covariates, such as the Cox model or discrete-time survival models with time-dependent covariates 9 . In addition, including time-varying covariates allows the model to use the most recently available information for each individual.
Risk prediction tools attempt to incorporate multiple risk factors into a single model to estimate the probability that an individual will develop an outcome in the future. More recently, the use of neural networks has become increasingly popular in research for risk prediction. The goal of neural networks is to learn the relationship between a set of predictors and response(s) (i.e. target outcome(s)). The building blocks of neural networks are known as nodes, which are organized into layers and connected to one another through weights. Feed-forward neural networks, a common type of neural network, often have an input layer, one or more hidden layer(s) and an output layer. The information is distributed through the neural network in one direction, beginning at the input layer and nishing at the output layer 10,11 . (See supplemental material: Additional Methods -Neural networks and the discrete survival model).
Some advantages of neural networks are that they do not rely on strict assumptions and that they can accommodate non-linear relationships in the data 10,12 . Recently, recurrent neural networks have been used for survival analysis applications with time-dependent covariates 13 . Alternatively, discrete survival analysis has been extended to the eld of neural networks in order to accommodate time-dependent covariates in prediction of survival 10,14−16 . The neural networks which extend upon discrete survival analysis lead to simpler implementation and interpretation compared to more complex methods such as the use of recurrent neural networks. That being said, the Ohno-Machado (1996) 15 and Ohno-Machado (1995) 16 neural networks involve the use of multiple neural networks, which adds complexity and can become computationally intensive.
The purpose of this article is to explore the use of a neural network, known as the Partial Logistic Arti cial Neural Network (PLANN) 10 to predict the time to diagnosis of bipolar spectrum disorders in the offspring of parents with con rmed bipolar disorder. PLANN is based on the logistic model for discrete survival analysis 9 . In this paper, we compare the two approaches. Both PLANN and the logistic model for discrete survival analysis predict the probability of an individual experiencing an event within a given time frame conditional on the individual not yet having experienced the event, which can be useful information for clinicians. The prediction of which offspring are at greater risk of bipolar disorder over time may allow for more proactive monitoring and prevention (reducing stress, improving sleep, healthy lifestyle choices).

Study design
For this study, we used the data collected as part of the ongoing Canadian longitudinal high-risk offspring study described in detail elsewhere (Duffy et al., BJP, AJP) 6,17 . The study design is a dynamic, prospective cohort study. Brie y, original study families were identi ed through parents with bipolar I disorder con rmed by SADS-L interview and blind consensus review of all available clinical information. Subsequently, pedigrees were expanded and included rst degree relatives of the original probands, who themselves were affected with bipolar spectrum disorders (bipolar I, II, recurrent major depression). Agreeable offspring ages 5-25 years were enrolled and completed face to face research interviews following KSADS-PL format and study measures at baseline and the followed-up prospectively on average annually. This study has been reviewed for ethical compliance by the Ottawa Independent Research Ethics Board and the Queen's University Health Science Research Ethics Board.

Characteristics of participants
In this analysis, we included 304 high-risk offspring from the Canadian high-risk cohort. The nal data analysis was based on 292 high-risk individuals with no missing data for the predictors of interest. The outcome was de ned as a DSM-V diagnosis of bipolar disorder (Bipolar I, II, NOS), major depressive disorder and/or schizoaffective disorder based on semi-structured KSADS-PL format interviews and blind consensus review based on all available clinical and research material.
We limited variables in the model to those that would be relevant and routinely collected by clinicians in an o ce setting (i.e., sex at birth, age at last observation, history of childhood abuse, antecedent clinically signi cant symptoms and non-mood syndromes. Clinically signi cant hypomanic, depressive, anxiety symptoms falling short of diagnostic criteria, as well as substance misuse and sleep problems were quanti ed based on clinical research interview and previously published consensus criteria 6 . Childhood physical and sexual abuse was determined in offspring 13 + years of age using the Childhood experiences of care and Abuse Scale 18 , while the (children's) Global Assessment of Functioning 19 was used at each assessment.

Statistical analysis
We compared the ability of Partial Logistic Arti cial Neural Network (PLANN) 10 to predict the time to diagnosis of bipolar spectrum disorders. We compared this approach to the more traditional logistic model for discrete survival analysis 9 . Models were evaluated using the time-dependent c-index 20 , the Brier score, and additional common measures of prediction accuracy. The time-dependent c-index is a measure of discrimination performance (See supplemental material -Additional methods -Model Evaluation)., that is, how well the model can rank individuals on their time to developing the outcome.
The Brier score 21 is a measure of calibration performance, which measures how well the model predicts the observed response. The Brier score measures the difference between the predicted probability of the event not occurring by a given follow-up time and the observed status of the individual at that time. The Brier score ranges between 0 and 1, with 0 indicating perfect calibration and 0.25 indicating a noninformative model that is no better than chance. Additional common measures of accuracy of prediction were used including the true positive rate (sensitivity), false positive rate (1 − speci city) and positive predictive value (probability that the those predicted to have the outcome actually experienced the outcome). We evaluated the predictive performance of the two approaches when the follow up time was divided into one year, three year, and ve-year time intervals.
Cross validation is a technique used to evaluate how the results might generalize to an independent data set. For all these measures we used 10-fold cross-validation, which divides the data set into 10 subsets. The model is trained on ( t to) the data 10 times, each time leaving out a different subset which is used as the test set, on which the evaluation measures are calculated. Finally, the evaluation measures are averaged across the 10 folds 22 .

Predictor variables
The time-xed predictors included: sex (male, female), parental response to lithium prophylaxis (yes, no), parental age of onset ( rst meeting bipolar diagnosis) and physical/sexual abuse (yes, no, unsure). In addition, over the follow-up period, time-dependent measures were recorded for the individuals. Binary time-dependent variables indicating diagnosis of subthreshold symptoms or full-threshold clinical diagnoses of various disorders (i.e. subthreshold activation, subthreshold depression, subthreshold sleep, subthreshold substance use, subthreshold anxiety, substance use, sleep, anxiety, and neurodevelopmental) were each equal to 0 prior to diagnosis and 1 after diagnosis. In addition, the cumulative number of major mood and minor mood episodes were measured for individuals over the follow-up period.

Partial Logistic Arti cial Neural Network (PLANN)
A neural network approach to discrete survival analysis was developed by Biganzoli et al. (1998) 10  For training the data, the observed or target response must be known. The target response of the neural network is the event indicator (outcome of interest i.e. bipolar spectrum disorder), δ ik , which is equal to 1 if the event occurs for the i th subject in the k th time interval and 0 if the event does not occur. For censored individuals, the target response is equal to 0 for each time interval in which the subject is observed. For uncensored subjects, the target response is equal to 1 for the time interval in which the event occurs and is equal to 0 for the previous time intervals (10).
Hyper-parameters are parameters of the neural network which are selected by the researcher prior to training the neural network. We selected the hyper-parameters which enhanced discrimination performance via the time-dependent c-index. The learning rate, momentum, and the ridge regularization parameter were the hyper-parameters that were optimized. The learning rate and momentum are both used in the process of minimizing the loss function 23 and the ridge regularization parameter determines to what degree the weights will be shrunk towards zero in order to avoid over tting the model to the training set 24 . Table 1 presents the percent observed or means of the predictors included in the analyses. 112 (38.36%) individuals developed bipolar spectrum disorder (outcome of interest) by last observation, while 180 (61.64%) did not and were censored. Figure 1 presents the Kaplan-Meier curve showing the probability that the outcome has not occurred by the age on the x-axis. Predictive model using PLANN

Results
The hyper-parameters selected for one-year, three-year and ve-year predictions can be found in Table 2.
The selected combination of hyper-parameters attained a mean time-dependent c-index of 0.6294, 0.5700, and 0.5841 during hyper-parameter optimization for one year, three year and ve-year predictions, respectively. For one-year predictions, PLANN performed well in terms of discrimination performance with a mean time-dependent c-index of 0.6325 across the 10 folds, indicating that the model can rank individuals better than chance. For three and ve years, the mean time-dependent c-index across 10 folds was 0.5468 and 0.5902, respectively indicating relatively weaker performance at three and ve years.
The results for the more traditional prediction measures such as accuracy, false positive rate (FPR), true positive rate (TPR), and positive predictive value (PPV) can be seen in Supplemental Tables 1-3 (Fig. 2). As seen in Fig. 2, PLANN predicted that the 'earlier-onset' individual had the lowest survival probability over time. When making one-year predictions, PLANN could predict that the 'mid-onset' individual had a lower survival probability than the 'no onset' individual. However, PLANN had more di culty distinguishing between the 'mid-onset' and 'no onset' individuals when three and ve-year predictions were made.

Logistic Model for Discrete Survival Analysis
For comparison with PLANN, the logistic model for discrete survival analysis was assessed in terms of its predictive performance for one, three and ve-year predictions. The mean time-dependent c-index and Brier scores across the 10 folds can be found in Table 3. In terms of discrimination performance (i.e. ranking individuals based on their risk), the logistic model performed only slightly better than chance when making one year, three year and ve-year predictions, as indicated by the fact that the timedependent c-indices were each close to 0.5. The logistic model had the best discrimination performance (i.e. highest c-index) when making one-year predictions compared to three and ve-year predictions.
The more traditional measures of prediction can be seen in Supplemental Tables 4-6 for one year, three year, and ve-year intervals, respectively. As can be seen, the mean accuracy across all time intervals increased as the time interval width increased. For one-year predictions, the time intervals which had higher true positive rates additionally had higher false positive rates and the positive predictive values were all fairly low for the logistic model with one-year predictions. Overall, the false positive rates dropped when the time interval width increased.
Finally, it was of interest to assess whether the logistic model could distinguish between the 'earlieronset', 'mid-onset' and 'no onset' individuals on a single test set, de ned previously. For these individuals, the predicted survival curves were plotted for the logistic model with one-year predictions, three-year predictions and ve-year predictions. See Fig. 3 for the predicted survival curves. As seen in Fig. 3, the logistic model predicted that the 'earlier-onset' individual had a higher probability of diagnosis (i.e. lower survival probability) over time compared to the 'mid-onset' and 'no onset' individuals. However, for one year, three year and ve-year predictions, the logistic model predicted that the 'mid-onset' individual had a higher probability of not being diagnosed than the 'no onset' individual.

Discussion
In this study we explored the potential utility of using partial logistic arti cial neural network (PLANN), an extension of discrete survival analysis, to predict time to diagnosis of bipolar disorder at 1, 3 and 5 years into the future in a well-characterized prospectively followed cohort of high-risk individuals identi ed based on a parent with bipolar spectrum disorder. We limited xed and time varying covariates in the model to data that would be routinely collected and available in clinical practice (i.e., sex, age, childhood abuse, subthreshold antecedent clinically signi cant symptoms and lifetime antecedent non-mood diagnoses).
PLANN was compared to a traditional logistic model for discrete survival analysis to assess whether the use of a neural network provides any bene t over a traditional statistical modeling approach. While PLANN and the logistic model have common advantages, such as enabling the incorporation of timevarying covariates due to the use of discrete time intervals, both models also have distinct advantages over one another. The logistic model allows for the interpretation of the effect of covariates on the discrete hazard and the evaluation of whether or not the covariates have a signi cant effect on the discrete hazard. That is, t it is possible to estimate the magnitude of effect of each unique predictor on the outcome, which is not possible using PLANN. On the other hand, PLANN has the ability to automatically detect non-linear relationships in the data. The importance of being enabled to automatically detect non-linear relationships in data, not possible by logistic models, is apparent for certain outcomes in bipolar disorder. For example, mood instability has been found to follow non-linear patterns 25 .
Overall, for predictive performance, PLANN outperformed the logistic model for one year, three year and ve-year predictions. PLANN was better able to discriminate or rank individuals based on their risk of developing bipolar disorder (i.e., higher time-dependent c-indices), better able to predict the probability of developing bipolar disorder (i.e., lower Brier scores) and had higher accuracy than the logistic model.
Both PLANN and the logistic model performed better in terms of discrimination (i.e. time-dependent cindex) and calibration performance (i.e. Brier scores) for more proximal predictions (i.e., one-year), compared to more distal predictions (three and ve-year). Moreover, the calibration performance deteriorated over time, with poor performance for more distal predictions of survival probability (i.e. 25 years) compared to more proximal predictions (i.e. 15 years) for all time interval widths. This nding was corroborated when examining how well the models could distinguish between an individual who had an earlier-onset diagnosis (12 years) from individuals who did not experience a diagnosis (40 years) or experienced a mid-onset diagnosis (20 years) in terms of their survival probability. Both models predicted that the earlier-onset (i.e. higher risk) individual had the lowest survival probability, however had di culty in distinguishing between the mid-onset (i.e. medium risk) and no onset (i.e. lower risk) individuals who had longer survival times.
Interestingly, the three and ve-year models had higher true positive rates and higher positive predictive values compared to the one-year models. This difference in results can potentially be explained by the fact that the time-dependent c-index and the Brier score use the predicted probability of not being diagnosed by given follow-up times for their measures whereas the false positive rate, true positive rate and positive predictive values use the conditional probability of diagnosis within each time interval for their measures. Therefore, when making predictions of the probability of not being diagnosed at given follow-up times, the one-year predictions are preferable but if one is interested in the conditional risk within a given time interval, then wider time intervals such as three or ve years are preferable.
Our risk prediction approach of using PLANN to predict onset of bipolar disorder differs from other published risk calculators (e.g., Hafeman et al (2017) 26 ), that have used a "baseline re-setting" Cox proportional hazards model. While both methods allow the inclusion of covariates measured at baseline and at follow-up visits and neither method requires an assumption about the distribution, the PLANN method we took does not require a proportional hazards assumption. In addition, we only included model variables that would be available in routine practice.

Strengths and Limitations
Strengths include the carefully assessed parental diagnoses based on longitudinal clinical observations con rming the risk status in the offspring, the measurement of diagnosis in high-risk offspring throughsemi-structured research clinical assessments and blind consensus reviews. However, the following limitations relevant to this analysis are worth noting. This sample size is small; additional breadth of data (e.g., genetic data, behavioural data) could have improved predictions; and external replication is needed.

Conclusion
This evaluation of PLANN is a useful step in the investigation of using neural networks as tools in the prediction of diagnosis of mental health for at-risk individuals and demonstrated the potential that neural networks have in this eld. PLANN performed better than the traditional discrete time survival model in predicting the development of bipolar disorder in high-risk individuals. However, both approaches struggled in making more distal predictions into the future. Future research replicating these approaches in different samples with the inclusion of additional data will help inform the further utility of risk prediction models to aid in clinical decision making in patients with bipolar disorder. Consent for publication: all named authors have provided consent for publication.
Availability of data: This is an ongoing study and requests for access to analyze de-identi ed study data may be made to the principal investigator Dr Anne Duffy as per ethically reviewed study protocol. Kaplan-Meier curve for diagnosis of bipolar disorder, major depressive disorder and/or schizoaffective disorder for at-risk individuals.