Developing an individualized risk calculator for psychopathology among young people victimized during childhood_ A population-representative cohort study

Background: Victimized children are at greater risk for psychopathology than non-victimized peers. However, not all victimized children develop psychiatric disorders, and accurately identifying which victimized children are at greatest risk for psychopathology is important to provide targeted interventions. This study sought to develop and internally validate individualized risk prediction models for psychopathology among victimized


Introduction
Childhood victimization is associated with a range of internalizing (Nanni et al., 2012;Takizawa et al., 2014), externalizing (Braga et al., 2017;Capusan et al., 2016), and psychotic disorders , likely reflecting a general vulnerability for psychopathology in victimized children (Schaefer et al., 2018). Yet, not all victimized children develop psychopathology (Rutter, 2013). Accurately identifying which victimized children are at greatest risk for psychopathology is therefore important, in order to provide targeted support and to inform rational allocation of resources. Previous studies have described individual-, family-, and community-level factors that increase or decrease risk of psychopathology among victimized children (i.e., vulnerability or resilience factors, respectively), and thus, may improve risk detection (Fritz et al., 2018;Meng et al., 2018). However, these studies also typically had important limitations: they were based on cross-sectional or short-term longitudinal designs, making it difficult to ascertain whether victimization and potential resilience/vulnerability factors preceded the onset of psychopathology (Cicchetti, 2013;Meng et al., 2018); they described resilience/vulnerability factors in isolation, likely over-estimating their individual contribution compared to more realistic multivariate models (Fritz et al., 2018); and they identified factors that modify the average risk for psychopathology for subgroups of victimized children with shared features, but did not test if such factors can accurately predict which victimized children do or do not develop psychopathology (Yarkoni and Westfall, 2017). As such, the extent to which established resilience/vulnerability factors can inform individualized risk prediction is unclear.
To address these gaps, we investigated psychiatric risk prediction among members of a nationally-representative prospective British cohort. Building on prediction modeling methods developed for medical conditions (Damen et al., 2016;Hippisley-Cox and Coupland, 2015) and adult psychiatric disorders (Cannon et al., 2016;Fusar-Poli et al., 2017;Hafeman et al., 2017), we developed multivariate individualized risk prediction models (risk calculators) for psychopathology among victimized children (Moons et al., 2009;Steyerberg and Vergouwe, 2014). We also internally validated our findings using a nested cross-validation approach to determine the classification accuracy of prediction models when applied to independent cases in our sample (Steyerberg, 2009).

Participants
Participants were members of the Environmental Risk (E-Risk) Longitudinal Twin Study, which tracks the development of a nationallyrepresentative birth cohort of 2232 twin children born in England andWales in 1994-1995. Full details about the sample are reported elsewhere (Moffitt and E-Risk Study Team, 2002) and in Supplementary Material. Briefly, the E-Risk sample was constructed in 1999-2000 when 1116 families (93% of those eligible) with same-sex 5-year-old twins participated in home-visit assessments. This sample comprised 56% monozygotic and 44% dizygotic twin pairs; sex was evenly distributed within zygosity (49% male). Families were recruited to represent the UK population of families with newborns in the 1990s, on the basis of residential location throughout England and Wales and mother's age.
Follow-up home-visits were conducted when children were aged 7, 10, 12, and 18 (participation rates were 98%, 96%, 96%, and 93%, respectively). At age 18, 2066 participants were assessed. Average age at time of assessment was 18.4 years (SD = 0.36); all interviews were conducted after the 18th birthday. There were no differences between those who did and did not take part at age 18 in terms of socioeconomic status, assessed when the cohort was initially defined (χ 2 = 0.86, p = .65), age-5 IQ (t = 0.98, p = .33), age-5 behavioral (t = 0.40, p = .69) or emotional (t = 0.41, p = .68) problems, or childhood polyvictimization (z = 0.51, p = .61). The Joint South London and Maudsley and Institute of Psychiatry Research Ethics Committee approved each study phase. Parents gave informed consent, and twins gave assent between 5 and 12 years and then informed consent at age 18.
The present study focuses on a subset of the E-Risk Study sample who were exposed to any type of severe victimization during childhood (n = 591; 26.5% of sample; 50% male). Information about the victimization measure used to derive this analytic sample is provided below and in Supplementary Material. Based on multivariate logistic regression and odds ratios (ORs), victimized and non-victimized participants did not differ on the distribution of sex (OR = 0.92, 95% CI = 0.76-1.12, p = .40) and ethnicity (White vs non-White; OR = 1.00, 95% CI = 0.72-1.39, p = .99); however, victimized young people were more likely to be in the lowest social class tertile compared to their non-victimized counterparts (OR = 2.91, 95% CI = 2.40-3.54, p < 0.001).

Childhood victimization
Prospective measures of victimization utilized in this cohort are described elsewhere (Danese et al., 2017;Fisher et al., 2015), and in Supplementary Material. In brief, lifetime exposure to several types of victimization was assessed repeatedly when children were 5, 7, 10, and 12 years of age. Comprehensive dossiers were compiled for each child with cumulative information about: exposure to domestic violence between mother and partner; frequent bullying by peers; physical abuse by an adult; sexual abuse; emotional abuse and neglect; and physical neglect, all between birth and age 12 years. Dossiers comprised reports from caregivers, recorded narratives of caregiver interviews, recorded debriefings with research workers who had coded any indications of abuse and neglect at any of the successive home visits, and information from clinicians whenever the study team made a child-protection referral. These were reviewed by two independent researchers and rated for the presence and severity (none/mild/severe) of each type of victimization. In the present study, prospectively-measured victimization was dichotomized to represent 'none/mild' (0) vs 'severe' (1) victimization.

Age-18 psychopathology
Past-year psychopathological symptoms were comprehensively assessed through private interviews available at age-18 follow-up. Ten disorder diagnoses were organized into three domains (internalizing, externalizing, and thought disorders), based on a reliable latent factor structure for psychopathology previously identified within the E-Risk Study (Schaefer et al., 2018). Full information on individual diagnoses is available in Supplementary Material. In brief, in order to effectively evaluate the classification accuracy of each prediction model, binary classifications were derived for each domain, denoting the presence of at least one of the constituent disorders based on diagnostic cut-offs. Participants were classified as having 'internalizing disorder' if they met diagnostic criteria for generalized anxiety disorder, major depressive disorder or post-traumatic stress disorder, or presented at least 2 of 5 eating disorder symptoms from an established screening tool, indicating a possible case of anorexia nervosa or bulimia nervosa (Morgan et al., 1999). Participants were classified as having 'externalizing disorder' where they met diagnostic criteria for attention-deficit/hyperactivity disorder, conduct disorder, alcohol dependence, cannabis dependence, or tobacco dependence. Finally, 'thought disorder' classification was based on the definite presence of at least one of seven psychotic symptoms, centered on delusions and hallucinations. The measure has good construct validity, as it was shown to share many of genetic, social, neurodevelopmental, and behavioral risk factors and correlates as adult schizophrenia (Polanczyk et al., 2010). From these three domainspecific classifications, an overall binary outcome for 'any psychiatric disorder' was created, denoting the presence of any internalizing, externalizing, or thought disorder (1), or the absence of all three (0).
The prevalence of each disorder and wider diagnostic domain, within both the overall E-Risk sample and our victimized sub-sample, is presented in Table 1. Overall, out of 553 victimized participants, 60.4% met diagnostic criteria for any of the ten psychiatric disorders, 37.9% met criteria for internalizing disorder, and 43.3% met criteria for externalizing disorder. Regarding comorbidity between internalizing and externalizing domains, among those diagnosed with at least one of their nine underlying disorders, 36.4% (n = 118) had both an internalizing and externalizing disorder, 27.2% (n = 88) had an internalizing disorder alone, and 36.4% (n = 118) had an externalizing disorder alone.

Childhood predictors
Individual-, family-, and community-level predictors were assessed between ages 5 and 12 years. We utilized a recent systematic review of multi-level predictors of maltreatment outcomes (Meng et al., 2018) and mapped these predictors to variables measured in the E-Risk Study (see Supplementary Tables S1 and S2). This approach is preferable to selecting predictors based on statistically-significant bivariate associations with the outcome in the target sample, as it avoids the circular logic of testing the predictive ability of variables already known to be associated with the outcome within that sample (Fusar-Poli et al., 2018). We identified 22 predictors, summarized in Table 2, with detailed descriptions in the Supplementary Material. With 22 predictors, we achieved a minimum events per variable (EPV) ratio of 10 for our main 'any psychiatric disorder' outcome, mitigating potential model instability due to over-fitting (Pavlou et al., 2015).

Statistical analyses
Analyses were conducted using STATA version 15.0 and R version 3.4.2. First, we tested whether the prevalence of each form of psychopathology significantly differed between victimized and non-victimized participants using logistic regression, correcting for familial clustering using the cluster STATA command.
Second, we developed and internally validated separate prediction models for three age-18 psychiatric outcomes using regularized logistic regression in the glmnet R package (Friedman et al., 2010). Specifically, we evaluated predictive ability for the overall measure of 'any psychiatric disorder', as well as separately for internalizing and externalizing disorders, as the two dimensions do show distinct features and risk profiles, even over and above a general factor for psychopathology (Lahey et al., 2017). We could not reliably estimate a separate model for thought disorder given the low prevalence of psychotic symptoms. Complete data were available in 91.3%, 91.5% and 91.3% of victimized children with available data for 'any psychiatric', 'externalizing' and 'internalizing' disorders, respectively (n = 504-505). Therefore, a complete-cases approach was utilized for model development and internal validation.
A full description of our analyses is available in Supplementary Material. Briefly, we conducted regularized regression using the Least Absolute Shrinkage and Selection Operator (LASSO) to identify subsets of predictors that maximized prediction accuracy for each outcome in unseen cases within the sample. LASSO shrinks coefficients towards zero, thereby reducing the variance of these estimates. Predictors whose coefficients are shrunk to zero are excluded from the model, enabling parsimonious solutions. The degree of shrinkage is determined by a tuning parameter, lambda (λ). Cross-validation was used to identify the optimal λ for a model (Tibshirani, 1996). To obtain an estimate of Table 1 Prevalence of psychopathology at age 18 among (i) the main E-Risk sample, and (ii) a subsample of E-Risk participants exposed to severe childhood victimization.  prediction accuracy in new cases from the same underlying population, we internally validated each model using nested 10-fold cross-validation (see Supplementary Fig. S2; Hastie et al., 2009). Performance measures were based on each child's predicted probability when held out from model selection and estimation as an unseen 'test case'. Discrimination (the model's ability to accurately classify those with and without psychopathology) was visualized using the receiver operator characteristic (ROC) curve, and quantified using the area-under-the-curve (AUC; Steyerberg, 2009). AUC ranges from 0.5 (chance-level) to 1 (perfect discrimination), with the following proposed benchmarks: <0.7 (poor); 0.7-0.8 (acceptable); 0.8-0.9 (excellent); 0.9-1.0 (outstanding; Hosmer et al., 2013). Calibration was assessed by plotting predicted probabilities (grouped into equal-interval bins) against observed outcomes, where a 45°line indicates perfect agreement. A chi-square test of unreliability (U) determined whether each calibration plot's intercept ('calibration-in-the-large') and slope significantly differed from the ideal line (Fenlon et al., 2018). Overall performance was assessed by the proportion of explained deviance ('pseudo-R 2 '), and the Brier score, or mean squared difference between predicted probabilities and actual binary outcomes (Brier, 1950). We scaled Brier scores by their maximum possible values, which vary based on the incidence of the outcome, from 0% (non-informative) to 100% (perfect; Steyerberg, 2009) Finally, we carried out two sensitivity analyses. First, we tested whether model performance and/or interpretability would improve if we allowed inclusion of a set of correlated data in our model. To do this, we re-estimated each model using elastic net regularization. Unlike LASSO, this regularization method selects or excludes sets of correlated variables, potentially allowing more predictors to be retained. Second, to test whether predictive ability was inflated by the presence of nonindependent twin observations, we re-ran analyses in ten subsamples, each consisting of one twin per pair.

Prediction modeling
Unstandardized coefficients for the predictors retained in each of the three models following LASSO variable selection (see Table 3) were obtained by re-estimating the cross-validated model using the average optimal λ tuning parameter, identified from ten repeated model development loops. Using each set of coefficients, we derived regression equations to estimate a child's individual risk for an outcome based on his/her values on these retained predictors (see Supplementary Table  S3). Additionally, the classification accuracy achieved by each model across various thresholds of predicted risk are presented in Supplementary Tables S4-S6. Model composition and performance are described below for each outcome.

Any psychiatric disorder
17 of 22 predictors were retained in the final LASSO model (see Table 3, column A), based on an average λ tuning parameter of 0.0109 (range: 0.0075-0.0151). Frequency distributions of predicted risks for victimized children with and without psychopathology at age 18, obtained using nested 10-fold cross-validation, are presented in Fig. 1a. This showed that, beginning at a predicted risk of 0.70, there was a higher proportion of victimized children with versus without any psychiatric disorder in each successive risk class. The ROC curve following internal validation yielded an AUC of 0.69 (95% CI = 0.64-.73; Fig. 2, solid line), representing a 69% probability that a randomly-selected victimized child with any psychiatric disorder will be classified as such by the model compared to a victimized child without psychopathology. The calibration plot (Fig. 3a) showed high agreement between predicted probabilities and observations when the predicted probability was 0.30 and higher; only two participants, or 0.4% of all cases, fell below this point, which likely explains the poorer calibration at this end of the distribution. Calibration-in-the-large (i.e., intercept; α = 0.02) and slope (β = 0.96) did not significantly differ from expected values of a perfectly-calibrated model (U = −0.004, χ 2 (2) = 0.06, p = .968). Regarding overall performance, the model explained 13.5% of binomial deviance, while the scaled Brier score indicated that the predictors explained 11% of the mean squared error between predicted probabilities and observed outcomes compared to a non-informative model.  Table 3, column B), based on a mean λ of 0.0089 (range: 0.0057-0.0118). There was a higher proportion of victimized children with versus without internalizing disorder between predicted risk thresholds of 0.50 and 0.89 (see Fig. 1b).

Externalizing disorder
LASSO regularization selected 15 predictors (see Table 3, column C) from a mean λ of 0.0146 (range: 0.0105-0.0178). Based on frequency distributions for predicted risk, there was a higher proportion of victimized children with versus without externalizing disorder from a predicted risk of 0.50 and upwards (see Fig. 1c). Discrimination was denoted by an AUC of 0.73 (95% CI = 0.69-.77; Fig. 2, dashed line), while calibration-in-the-large (α = −0.01) and slope (β = 1) values did not diverge significantly from those of a perfectly-calibrated model (U = −0.004, χ 2 (2) = 0.003, p = .999; see Fig. 3c). Overall, the model explained 15.4% of the deviance, with the scaled Brier score indicating a 15.3% reduction in mean squared error compared to a non-informative model.

Regularization penalty
For all three outcomes, model performance statistics (see Supplementary Table S7) and corresponding plots (see Supplementary  Figs S3 and S4) obtained from nested 10-fold cross-validation using less restrictive 'elastic net' regularization resembled those generated using LASSO, implying that predictive ability was not significantly worsened by applying the most parsimonious form of regularization to model coefficients.

Non-independence of twins
Average cross-validated performance statistics for each outcome across 10 random single-twin subsamples (n = 304-305) resembled those obtained using the full sample (see Supplementary Tables S8-S10), suggesting that the inclusion of non-independent observations did not significantly bias predictive accuracy.

Discussion
We found that factors known to be associated with psychopathology among victimized children can be statistically modelled to predict individual risk for psychopathology during the transition to adulthood. All three models were well-calibrated, and discrimination, while only adequate (AUC = 0.66-0.73), was within the range of established risk calculators for medical conditions (Damen et al., 2016;Wang et al., 2018), as well as recently-developed prediction models in psychiatry (Cannon et al., 2016;Fusar-Poli et al., 2017;Hafeman et al., 2017). Moreover, because there is currently no evidence base for evaluating individual risk among victimized children, these data-driven approaches are likely to improve on current practice owing to their objectivity and consistency. Our results also suggest a need to consider prediction of internalizing and externalizing psychopathology separately, given observed variations in predictive performance and model configuration; for example, while all five personality dimensions were retained in the prediction of internalizing disorder, only extraversion contributed to the prediction of externalizing disorder.
Our findings should be interpreted in light of several limitations. First, as the E-Risk Study comprises twins, we cannot be certain that results generalize to singletons. However, the prevalence of victimization in our sample is comparable to recent UK general population estimates (Radford et al., 2013), while rates of psychopathology are generally comparable between twins and singletons (Gjone and Fig. 2. Receiver operating characteristic (ROC) curves for the three prediction models among victimized children. ROC curves plot the true positive rate (sensitivity; the proportion of actual positive outcomes correctly identified as such) against the false positive rate (1-specificity; the proportion of incorrectly-classified positive outcomes). The solid diagonal line denotes chance-level discrimination. Novik, 1995). Sensitivity analyses using one twin from each family also suggested that the presence of twins did not inflate predictive accuracy estimates. Second, as findings were based on a community sample of British twins, they may not generalize to other samples, for example children in foster care. External validation is needed to test the relevance of our findings to different contexts before clinical implementation. Third, unlike conventional regression approaches, our prediction models do not permit interpretation of individual coefficients. Specifically, in applying a penalty to reduce over-fitting to the data, LASSO regularization introduces bias into the regression estimates, such that coefficients are no longer reflective of true populationlevel associations with the outcome. Moreover, LASSO models perform inbuilt variable selection based around the ability of a risk factor to accurately predict variation in the outcome. In this way, we maximize predictive performance in unseen cases but limit model interpretability, as coefficients cannot be used to infer a causal relationship with the outcome; indeed, many variables that are important for prediction may only exert small effects when evaluated based on statistical significance or effect size Yarkoni and Westfall, 2017). Instead, these coefficients should be viewed as components of the multifactorial risk profile represented by each model's specific combination of predictors.
Despite these limitations, the paper makes important methodological contributions to promote use of prediction modeling techniques in the context of childhood victimization. Compared to previous research, our prospective design ensured appropriate temporal ordering of effects from birth to early adulthood, the developmental point by which most psychiatric disorders have emerged (Kim-Cohen et al., 2003), while a rich characterization of individual-, family-, and community-level characteristics within our sample allowed for comprehensive multivariate models that incorporated potential determinants of both risk and resilience. Moving beyond the group-based average effects estimated in conventional regression analyses, our prediction modeling approach yielded an individualized risk estimate for each victimized child, quantifying their unique likelihood of developing psychopathology. Moreover, our internal validation procedure (i.e., nested cross-validation) ensured that each participant's risk score was calculated when they were excluded from model development, thus reducing over-fitting and increasing the generalizability of our models to unseen cases (Steyerberg, 2009). Finally, automatic variable selection by LASSO regularization suggested that predictors with established univariate-significant associations may not offer an independent contribution to predictive accuracy when considered in a more systemic context.
Overall, these initial findings provide proof-of-principle evidence that a range of individual, family, and community factors can be combined to derive individualized risk scores for psychopathology among victimized young people. With sufficient external validation, prediction modeling has the potential to enhance evidence-based clinical decision-making for this vulnerable population in social work and child and adolescent psychiatry settings and, in turn, inform more rational allocation of limited resources. However, although such risk calculators could introduce greater objectivity and consistency to clinical practice, they should be used to support, rather than replace, professional judgments. Crucially, good predictive performance, even in external samples, does not guarantee clinical usefulness. For example, evaluating overall predictive accuracy across a range of decision thresholds for absolute risk classification (i.e., 'low-risk' versus 'highrisk'), we found that cut-points of 60%, 40%, and 40% provided an optimal balance of sensitivity and specificity in predicting any psychiatric disorder, internalizing disorder, and externalizing disorder, respectively (see Supplementary Tables S4-S6). However, the appropriate risk threshold is likely to vary depending on the cost-benefit ratio of the corresponding clinical decision. For example, clinicians could opt for a lower cut-off during initial screening, where the benefit of early detection of true-positive cases may outweigh the cost of higher false- Fig. 3. Calibration plots for (a) any psychiatric, (b) internalizing, and (c) externalizing disorder in victimized children, showing overall agreement between model-predicted risks and observed outcomes. The broken diagonal line represents a perfectly-calibrated model. Error bars represent 95% confidence intervals. Each data-point denotes a probability bin, while its size indicates the relative number of cases within that bin.
positives, but may prefer a higher threshold for decisions around treatment allocation to ensure that places are allocated to those with the greatest need. Furthermore, classification is a forced choice, and individual predicted probabilities of risk should be presented to both clinicians and patients for an informed choice when all information is available. A decision should not simply rely on a prediction model based only on data included in the prediction model. Therefore, the clinical context, net-benefit and cost-effectiveness of any risk calculator should be evaluated in consultation with key stakeholders (Steyerberg and Vergouwe, 2014).

Conclusion
The present study provides initial evidence in support of the use of multivariable prediction modeling to derive individualized risk estimates for psychopathology among victimized children. Although we acknowledge a need for external validation and evaluation of clinical usefulness before these models can be confidently integrated into decision-making, they nevertheless offer the potential to supplement current practice, ultimately providing more personalized care for some of the most vulnerable children in society.

Declaration of Competing Interest
The authors have no conflicts of interest to declare.

Role of the funding source
The E-Risk Study is funded by the UK Medical Research Council (grant number G1002190). Additional support was provided by the US National Institute of Child Health and Human Development(grant number HD077482); the Jacobs Foundation; and a research grant from the National Society for Prevention of Cruelty to Children(NSPCC) and Economic and Social Research Council (ESRC). HLF was supported by an MQ Fellows Award(grant number MQ14F40). LA is the Mental Health Leadership Fellow for the ESRC. This paper represents independent research part-funded by the National Institute for Health Research (NIHR)Biomedical Research Centre at South Londonand Maudsley NHS Foundation Trust and King's College London to AD. The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health and Social Care.