Meta-analytical prognostic accuracy of the Comprehensive Assessment of at Risk Mental States (CAARMS): The need for refined prediction

Abstract Primary indicated prevention is reliant on accurate tools to predict the onset of psychosis. The gold standard assessment for detecting individuals at clinical high risk (CHR-P) for psychosis in the UK and many other countries is the Comprehensive Assessment for At Risk Mental States (CAARMS). While the prognostic accuracy of CHR-P instruments has been assessed in general, this is the first study to specifically analyse that of the CAARMS. As such, the CAARMS was used as the index test, with the reference index being psychosis onset within 2 years. Six independent studies were analysed using MIDAS (STATA 14), with a total of 1876 help-seeking subjects referred to high risk services (CHR-P+: n = 892; CHR-P–: n = 984). Area under the curve (AUC), summary receiver operating characteristic curves (SROC), quality assessment, likelihood ratios, and probability modified plots were computed, along with sensitivity analyses and meta-regressions. The current meta-analysis confirmed that the 2-year prognostic accuracy of the CAARMS is only acceptable (AUC = 0.79 95% CI: 0.75–0.83) and not outstanding as previously reported. In particular, specificity was poor. Sensitivity of the CAARMS is inferior compared to the SIPS, while specificity is comparably low. However, due to the difficulties in performing these types of studies, power in this meta-analysis was low. These results indicate that refining and improving the prognostic accuracy of the CAARMS should be the mainstream area of research for the next era. Avenues of prediction improvement are critically discussed and presented to better benefit patients and improve outcomes of first episode psychosis.


Introduction
Psychosis is a severe psychiatric condition and there is limited evidence that treatments are successful in improving patients' functioning once the disorder is established [1]. Intervening in the earlier phases is therefore the only viable possibility to substantially alter the course of the disorder [2,3]. Within early intervention, a key focus for improving the outcome has been primary indicated prevention [2,4,5]. Primary indicated prevention allows for early intervention for those at clinical high risk of developing psychosis (CHR-P), with greater scope for improving outcomes. To do this effectively, the first necessary step is to reach an accurate, robust prognostic identification of individuals meeting CHR-P criteria who will subsequently develop psychosis or not. Ideally, all subjects who will actually develop psychosis should be classified as ''at risk'' (CHR-P+) while those not developing an established psychosis should be classified as ''not at risk'' (CHR-P-). These key concepts involved in prognostic reasoning in the CHR-P have been detailed and presented in a recent paper by our group [6].
Prognostic prediction is used in many branches of medicine to identify individuals who may develop a particular disease [7]. For example, fasting glucose, oral glucose tolerance test and glycated Primary indicated prevention is reliant on accurate tools to predict the onset of psychosis. The gold standard assessment for detecting individuals at clinical high risk (CHR-P) for psychosis in the UK and many other countries is the Comprehensive Assessment for At Risk Mental States (CAARMS). While the prognostic accuracy of CHR-P instruments has been assessed in general, this is the first study to specifically analyse that of the CAARMS. As such, the CAARMS was used as the index test, with the reference index being psychosis onset within 2 years. Six independent studies were analysed using MIDAS (STATA 14), with a total of 1876 help-seeking subjects referred to high risk services (CHR-P+: n = 892; CHR-P-: n = 984). Area under the curve (AUC), summary receiver operating characteristic curves (SROC), quality assessment, likelihood ratios, and probability modified plots were computed, along with sensitivity analyses and meta-regressions. The current meta-analysis confirmed that the 2-year prognostic accuracy of the CAARMS is only acceptable (AUC = 0.79 95% CI: 0.75-0.83) and not outstanding as previously reported. In particular, specificity was poor. Sensitivity of the CAARMS is inferior compared to the SIPS, while specificity is comparably low. However, due to the difficulties in performing these types of studies, power in this meta-analysis was low. These results indicate that refining and improving the prognostic accuracy of the CAARMS should be the mainstream area of research for the next era. Avenues of prediction improvement are critically discussed and presented to better benefit patients and improve outcomes of first episode psychosis. C [8] and systolic blood pressure and ratio of total serum cholesterol to high density lipoprotein cholesterol levels are used to detect individuals at high risk for developing cardiovascular disease [9]. However, unlike these other fields, there are no biological tests to assess the risk of developing mental disorders [10], which is instead reliant on semi-structured CHR-P psychometric interviews, such as the CAARMS (Comprehensive Assessment for At Risk Mental States) [11]. Recently, the CAARMS has become the mainstream tool to detect CHR-P individuals in the UK, recommended by international bodies, such as NICE [12]. Therefore, understanding its exact psychometric properties is of paramount clinical relevance. The CAARMS shows excellent inter-rater reliability when performed by trained raters (0.85) [13]. However, its prognostic accuracy is uncertain. A recent metaanalysis by our lab [14] investigated the prognostic accuracy of CHR-P instruments, showing generally excellent prognostic performance of these instruments. However, CHR-P tools were grouped together including the CAARMS [11], the SIPS (Structured Interview for Prodromal Syndromes) [15] and the SPI-A (Schizophrenia Proneness Instrument-Adult Version) [16]. This was due to the fact that there were not enough studies contributing data to assess the meta-analytical prognostic accuracy of the CAARMS specifically. Given the marked differences between the CAARMS and other CHR-P instruments [17], in particular with respect to the functional deterioration criterion [18], it is possible that the previously reported meta-analytical prognostic accuracy is not completely accurate. In addition, the previous meta-analysis combined multiple follow-up time points, and even though meta-regressions of this variable found no significant effect, validity of the prognostic accuracy results would be improved by using a more defined and consistent follow-up time [14]. The current study tackles these caveats and advances knowledge in the psychometric properties of the CAARMS. We capitalize on recently published CAARMS studies reporting useful and innovative meta-analytical data to conduct a meta-analytical prognostic accuracy analysis of the CAARMS at two-year follow-up. This is the period of time during which most transitions to psychosis occur [19]. The results will hopefully support the refinement of psychosis prediction and therefore facilitate indicated primary prevention in CHR-P individuals.

Search strategy
Two investigators (DO, PFP) conducted a two-step literature search. At a first step, the Web of Knowledge database was searched, incorporating both the Web of Science and Medline. The search was extended until August 2017, only including abstracts in English. The electronic research adopted several combinations of the following keywords: ''at risk mental state'', ''psychosis risk'', ''prodrome'', ''prodromal psychosis'', ''ultra-high risk'', ''high risk'', ''help-seeking'', ''diagnostic accuracy'', ''sensitivity'', ''specificity'', ''psychosis prediction'', ''psychosis onset''. The second step involved the use of Scopus to investigate citations of previous systematic reviews on transition outcomes in CHR-P subjects and a manual search of the reference lists of the retrieved articles.
Articles identified through these two steps were then screened for the selection criteria on the basis of abstract reading. The articles surviving this selection were assessed for eligibility on the basis of full text reading. To achieve a high standard of reporting, we adopted the Meta-analysis Of Observational Studies in Epidemiology (MOOSE) checklist [20].

Selection criteria
Studies were eligible for inclusion if: they were reported in original articles, written in English; they had used the CAARMS (index test) in the same pool of referrals; they had followed up both CHR-P+ and CHR-P-subjects for psychosis onset (reference index) using established international diagnostic manuals (ICD or DSM); they had reported sufficient prognostic accuracy data at 2-year follow-up.
With respect to this last point, when data were not directly presented, they were indirectly extracted from associated data. Additionally, we contacted all corresponding authors to request additional data when needed.
We excluded: abstracts, reviews, articles in a language other than English; studies in which interviews were not conducted in the same pool of referrals or that used an external CHR-P group of healthy controls; studies with overlapping datasets.
In case of multiple publications deriving from the same study population, we selected the article reporting the largest and most recent data set. The literature search was summarized according to PRISMA guidelines [21].

Recorded variables
Data extraction was independently performed by two investigators (DO, PFP). Data included author, year of publication, characteristics of subject samples (baseline sample sizes, mean age and age range, proportion of females), diagnostic criteria used at follow-ups to assess the psychotic outcome, prognostic accuracy data (number of true and false positives, true and false negatives or associated data) and quality assessment conducted with the Quality Assessment of Diagnostic Accuracy Studies (QUADAS) checklist [22].

Statistical analysis
The statistical analysis followed the Cochrane Guidelines for Systematic Reviews of Diagnostic Test Accuracy, Version 1.0 [23] and the Methods Guide for Authors of Systematic Reviews of Medical Tests by the Agency for Healthcare Research and Quality (chapter 8) [24]. Evaluating test accuracy requires knowledge of two quantities: the test's sensitivity (Se) and specificity (Sp). Metaanalysis methods for diagnostic test accuracy thus have to deal with two summary statistics simultaneously rather than one [23]. Methods for undertaking analyses, which account for both Se and Sp, the relationship between them, and the heterogeneity in test accuracy, require fitting advanced hierarchical random effects models [23].
For each study, we constructed a two-by-two table, which included true positive, false positive, true negative, and false negative values. The baseline sample size was conservatively used as the base reference.
Data were then analysed with MIDAS (Meta-analytical Integration of Diagnostic Accuracy Studies) [25], a comprehensive program of statistical and graphical routines for undertaking meta-analysis of diagnostic/prognostic test performance in STATA 14 software [26]. The index tests of CHR-P status (CHR-P+ or CHR-P-) and reference tests of transition to psychosis according to international diagnostic manuals (ICD or DSM as gold standard) were dichotomous.
Primary data synthesis was performed within the bivariate mixed-effects regression framework for the logit transforms of Se and Sp. In addition to accounting for study size, the bivariate model estimates and incorporates the intrinsic negative correlation that may arise between Se and Sp within studies (threshold effect) [27], as a result of differences in the test threshold between studies [28]. The bivariate model allows for heterogeneity beyond chance as a result of clinical and methodological differences between studies [28].
We estimated the summary Se and Sp and the hierarchical SROC (summary receiver operator characteristic) curves [23,32]. A SROC graph across each predictor, with the y-axis representing the predictor's Se and the x-axis representing 1specificity, was used to plot a 95% confidence region and a 95% prediction region around the summary estimates to illustrate the precision with which the summary values were estimated (confidence ellipse of a mean), and to show the amount of between-study variation (prediction ellipse; the likely range of values for a new study). We also estimated the AUC (area under the curve). The AUC serves as a global measure of test performance. Values in the range of 0.9-1 are considered outstanding, between 0.8 and 0.9 are considered excellent, between 0.7 and 0.8 are considered acceptable [29].
Heterogeneity across studies was assessed using the I 2 , with values of 25%, 50% and 75% representing mild, moderate and severe inconsistency, respectively [30]. Within MIDAS, forest plots and heterogeneity statistics can be created for each test performance parameter individually or may be displayed as paired plots. Meta-regressions were used to examine the influence of mean age, gender (% females), sample size, and quality assessment (QUADAS) on meta-analytical estimates. Furthermore, we investigated the prognostic accuracy difference between CAARMS based studies and studies employing the SIPS, as detected in the previous meta-analysis [14]. To control for biases associated with imbalanced datasets [31], we further tested the impact of the proportion of CHR-P+ subjects in the overall samples. The meta-regressions were used if there was substantial heterogeneity (I 2 > 50%) [32] and when more than 10 studies were available.
Sensitivity analyses (i.e., exclusion of outliers and rerunning of the model) were conducted to further explore heterogeneity. We did not test publication bias [33], because no proven statistical method exists for this type of meta-analysis [34].
In a second step, we employed the probability-modifying plot to estimate the clinical or patient-relevant utility of the CAARMS in subjects seeking help at CHR-P services.
The clinical utility was evaluated using the positive and negative likelihood ratios (LR+ and LR-) to calculate post-test probability (post-TP) based on Bayes' theorem (with pre-test probability, pre-PT, being the prevalence of the condition in the target population), as follows: post-TP = LR Â pre-TP/[(1 -pre-TP) + (pre-TP Â LR)] [27]. Specifically, the probability-modifying plot [25] is a graphical sensitivity analysis of the test's predictive values across a baseline psychosis risk continuum in people seeking help at CHR-P services. It depicts separate curves for positive and negative tests and uses general summary statistics (i.e., unconditional positive and negative predictive values, NPV and PPV, which permit underlying psychosis risk heterogeneity) to evaluate the effect of the CHR-P assessment on predictive values [35]. The pre-TP probability of psychosis risk in subjects seeking help at early detection services was computed in the current dataset as the proportion of subjects developing psychosis on the total baseline sample (CHR-P+ plus CHR-P-) [25].
Statistical tests were two-sided and statistical significance was defined as P values < 0.05.

Database
The literature search produced 6 independent studies [36][37][38][39][40][41] that met inclusion criteria with a total of 1876 subjects (CHR+: n = 892; CHR-: n = 984) referred to clinical high risk services. The dataset was balanced with CHR+ individuals composing 47.5% of the total subjects. The characteristics of the studies are reported in the Table 1 while the PRISMA diagram is depicted in Fig. 1. The MOOSE checklist is reported in the eTable 1. The detailed QUADAS assessment is reported in the eTable 2 and eFig. 1.

Clinical utility of the CAARMS at 2 years
The 2-year psychosis transition risk in the 1876 subjects was 0.09 (95% CI = 0.05-0.13). On the basis of the prior distribution, the continuous relationship between pre-TP and post-TP probability is summarized in Fig. 3. Being CHR-P+ was associated with a 0.16 (95% CI = 0.10-0.22) risk of developing psychosis within 2 years, yet a small LR+ of just 1.9 (95% CI = 1.5-2.4) while being CHR-Pwas associated with a 0.03 (95% CI = 0.02-0.05) risk of transition to psychosis with a moderate LR-of 0.25 (95% CI = 0.13-0.48).

Meta-regressions and sensitivity analyses
Sensitivity analysis suggested that one study [41] was influential with a Cook's distance > 1. While we hypothesised this was due to the study reporting 0 false negatives, we were unable to test the effect of false negatives through meta-regression due to low number of studies. Similarly, we were unable to perform meta-regressions for age, gender, QUADAS score or sample size as there were fewer than 10 studies contributing data. As indicated in the methods, we were able to perform a metaregression comparing the prognostic accuracy of the CAARMS vs. that of the SIPS, using the studies reporting SIPS data [42][43][44][45][46] only as identified in our previous study (n = 5, CHR+: n = 783; CHR-: n = 360). As indicated in Fig. 4, Se was significantly higher (P < 0.001) for the SIPS (n = 5, mean = 0.95, 95% CI 0.91-0.99) compared to the CAARMS (n = 6, mean = 0.87, 95% CI 0.79-0.96), while Sp was comparably (P = 0.27) low in the SIPS (n = 5, mean = 0.45, 95% CI = 0.38-0.53) and in the CAARMS (n = 6, mean = 0.55, 95% CI 0.48-0.62).

Discussion
This is the first meta-analysis specifically investigating the prognostic accuracy of the CAARMS for the prediction of psychosis. We found 6 studies that investigated prognostic accuracy of the CAARMS at two-year follow-up, which contributed a relatively large database of 1876 subjects overall, with 892 considered CHR-P+ and 984 CHR-P-. Prognostic accuracy of the CAARMS in terms of AUC was found to be only acceptable (0.79), mostly mediated by its substantial ability to rule out psychosis (i.e. LR-was relatively small and Se high). However, this was at the expense of ruling in psychosis (i.e. LR+ was small and Sp was poor). While prognostic accuracy was overall acceptable, this study indicates that refining the prediction of outcomes should be the key priority of future research in this field.
The primary aim of the study was to synthesize available data for the prognostic accuracy of the CAARMS in determining psychosis risk 2 years after young help-seeking subjects presented to CHR-P services. As noted in the introduction, our recent metaanalysis [14] looked into the prognostic accuracy of CHR-P instruments as a collective. The current study advances knowledge indicating that the exact prognostic accuracy of the CAARMS alone is weaker (0.79) than the overall value previously observed when the CHR-P instruments were pooled together (0.90). Although not as outstanding as before, the AUC value here reported is still considered to be acceptable for a diagnostic test and is comparable to other prognostic tools used in different areas of medicine, such as the AUC = 0.76 attributed to the Cambridge risk score for prediabetes [47]. In a similar fashion, we found that the Se (0.86) of the CAARMS alone was less impressive than the Se (0.96) of CHR-P instruments assessed in the previous meta-analysis. Interestingly, there was an apparent minor increase in Sp (0.55 for CAARMS alone compared to 0.47 for CHR-P instruments generally) [14]. The lower AUC compared to the previous general estimate may reflect profound operationalization differences between the CAARMS and the other CHR-P instruments. For example, a comparative analysis between the CAARMS and the SIPS confirmed caseness discrepancies between the two instruments [17], mostly due to different definition of brief limited intermittent psychotic cases, ascertainment of comorbidities [17] and of functional level at intake [18]. To Table 1 Independent studies included in the meta-analysis (studies n = 6; 1876 subjects; CHR+: n = 892; CHR-: n = 984).

Study
QUADAS score (14 = max); exposure to antipsychotics at baseline [ ( F i g . _ 1 ) T D $ F I G ] directly test the effect of these differences on the prognostic performance, in the current study, we performed the first meta-analytical comparison of Se and Sp across the CAARMS and SIPS, using previously published SIPS data [14]. We found that Se was higher in the SIPS compared to the CAARMS, while there were no substantial differences in Sp. Overall, it is unlikely that these differences may account for significant differences in the positive predictive values of the two instruments, as confirmed by previous meta-analyses in CHR-P+ samples [18]. In a second step, we estimated the clinical utility of the CAARMS. As previously reported by our lab, clinical utility is not static, instead reliant on the underlying pre-test risk in any given population [49][50][51]. We found that being classified as CHR+ by the CAARMS is associated with a 16.4% risk of developing psychosis within 2 years, which is lower than the 29.1% 2-year transition risk previously reported [52]. This was driven by a small LR+ (1.9), similar to the LR+ seen previously (1.82) [14]. CHR-individuals had 3.38% 2-year transition rate and this was driven by a moderate LR-(0.25), which was not as large as the LR-for CHR assessments as a whole (0.09) [14]. These findings taken altogether indicate that the acceptable prognostic accuracy is due to an imbalance between Se and Sp and LR+ and LR-, with the CAARMS being a valuable tool to correctly identify individuals who will develop psychosis however showing only modest ability to identify those who will not.
On a pragmatic level, the results of this meta-analysis show that the only acceptable prognostic accuracy of the CAARMS needs improving through a refined assessment of psychosis risk. An improved detection of individuals who will transition would lead to improved clinical and research opportunities. For example, a greater proportion of true positives would lead to more efficient primary indicated prevention as well as a more homogenous CHR-P group [48,53] for developing putative treatments. This manuscript has the clinical potential to be the reference point for refining future versions of the CAARMS or for the development of refined prognostic tools and assessments. To improve prediction of psychosis, it seems necessary to tailor it on an individual level. To date, the CAARMS has just considered CHR-P+ individuals as belonging to a whole group. However, it is now clear that such an assumption is incorrect, given the profound difference in level of psychosis risk observed across different CHR-P+ subgroups [48,53]. Furthermore, to date, psychosis prediction has been limited to the assessment and rating of CHR-P symptoms and signs. However, it is evident that these are only epiphenomena of underlying neurobiological and psychological processes that may characterize the onset of psychosis in vulnerable individuals. Research evidence in the field of risk and protective factors associated with an impending vulnerability to psychosis has accumulated over the past few decades and only recently has it been systematically assessed. In a recent large-scale meta-analysis, our lab has stratified the level of evidence for associations of several risk or protective factors and established psychotic disorders [54]. This study may lay the groundwork for investigating how specific risk or protective factors accumulate in CHR-P+ individuals explaining their increased liability to develop psychosis. In a first attempt by our lab [55], we reviewed forty-four studies encompassing 170 independent datasets and 54 risk/protective factors in CHR-P+ individuals. We showed that CHR-P+ individuals were more likely to show obstetric complications, tobacco use, physical inactivity, childhood trauma/emotional abuse/physical neglect, high perceived stress, childhood and adolescent low functioning, affective comorbidities, male gender, single status, unemployment and low educational level as compared to controls. The differential accumulation of these factors in each CHR-P+ individuals are likely to account for the different outcomes observed in these samples, such as psychosis onset, persistence of CHR-P+ features or remission. A refinement of psychosis prediction in these samples would inevitably require a careful investigation of these factors beyond the rating of severity and frequency of CHR-P+ symptoms as currently required by the CAARMS.
[ ( F i g . _ 2 ) T D $ F I G ] [ ( F i g . _ 3 ) T D $ F I G ] Fig. 3. Meta-analytical probability-modifying plot, illustrating the relationship between the pre-test probability (pre-TP) and post-test probability (post-TP) i.e. psychosis risk in help-seeking subjects following CAARMS assessment, computed as the likelihood of a positive (above diagonal line; LR+) or negative (below diagonal line; LR-) test result over the pre-TP between 0 and 1.
[ ( F i g . _ 4 ) T D $ F I G ] Fig. 4. Meta-regression analyses comparing at meta-analytical level the 2-year sensitivity and specificity of the CAARMS vs. SIPS for the prediction of psychosis. SIPS data taken from [14].

Limitations
Some limitations of this meta-analysis need to be acknowledged. Firstly, only 6 studies were able to be synthesised for this meta-analysis, and although supplying a healthy number of subjects, power could be questioned. Another limitation of our meta-analysis is the small sample size. However, conducting longitudinal studies in individuals assessed for a CHR-P state but not meeting intake criteria is logistically challenging and therefore only a few studies are currently available. Secondly, heterogeneity was very high and this could potentially have been reduced through a greater pool of studies. Thirdly, this heterogeneity remains unexplained as we were unable to perform metaregressions because there were not enough studies.

Conclusion
The 2-year meta-analytical prognostic accuracy of the CAARMS in predicting psychosis is only acceptable. A refined prediction of psychosis risk is necessary to advance clinical research in this area.

Disclosure of interest
The authors declare that they have no competing interest.