Predicting the onset of psychotic experiences in daily life with the use of ambulatory sensor data – A proof-of-concept study

of


Introduction
The advancement of eHealth technologies in recent years has allowed for multifaceted assessments of psychological and physiological parameters in everyday life to monitor mental health in real-time.Potentially, these real-time, daily-life assessments can be utilized to develop prediction models that identify states of vulnerability and opportunities for change (Nahum-Shani et al., 2015, 2018).At best, such predictive algorithms could be sufficiently precise to constitute a foundation for Just-In-Time-Adaptive Interventions (JITAI) that automatically provide the right treatment at the right time when an individual needs help (Nahum-Shani et al., 2015, 2023;Wang and Miller, 2020).Initial developments of prediction algorithms in the context of psychosis relied on active engagement.Participants needed to answer brief, randomly prompted ambulatory assessment questionnaires to decide whether treatment is needed, i.e., whether a momentary intervention module should be prompted (Ben-Zeev et al., 2013, 2014).
Aside from developments of JITAIs specifically for psychosis, a recent review of JITAIs for harmful substance use by Perski et al. (2022) highlights that most JITAIs in this area relied on the active engagement of participants as well.A few studies that did not rely on active engagement used the GPS location for passive monitoring with simple rules for intervention prompting, i.e., an intervention would be triggered if a person enters a certain location (e.g., prior places associated with substance abuse; Perski et al., 2022).
Requiring participants to continuously and actively engage with questionnaires to identify states of vulnerability is time-consuming and may be too cumbersome over prolonged periods (Juarascio et al., 2018;Nahum-Shani et al., 2018).A possible solution to reduce user burden might be to rely on passively collectible parameters to detect states of vulnerability (Juarascio et al., 2018;Nahum-Shani et al., 2018).
To develop prediction models, it is necessary to identify passively collectible parameters that can be used to track states of vulnerability with high precision (Nahum-Shani et al., 2018).For psychotic experiences, physiological parameters that signify changes in the activity of the sympathetic nervous system (e.g., electrodermal activity [EDA]) and parasympathetic nervous system (e.g., heart rate variability [HRV]) are particularly promising.Alterations in autonomic arousal specific to psychotic experiences and the onset of psychosis play a prominent role in theoretical accounts for the etiology of psychosis.In fact, even the classic vulnerability/stress model by Nuechterlein and Dawson (1984) already built on a review of evidence that alterations in EDA in schizophrenia encompass an absence of reactivity to novel stimuli in some patients and a persistent increase in sympathetic arousal in others (Dawson and Nuechterlein, 1984).More recent theoretical accounts outlined putative dynamic associations between sympathetic changes (i.e., EDA) and parasympathetic changes (i.e., HRV) that start to emerge in an at-risk state and are fully formed in acute phases of psychosis (Montaquila et al., 2015).According to this review, alterations in parasympathetic nervous system activity (i.e., HRV) in people with psychosis decrease their ability to effectively recover from stress, leading to a sympathetic dominance that is associated with positive symptoms.
Moreover, ambulatory assessment studies that combine Ecological Momentary Assessment (EMA) with continuous physiological monitoring indicate that these physiological changes are mirrored in state changes of autonomic arousal in daily life at the moment people with psychosis experience symptoms.For example, a 36-h EMA study with people with psychosis showed that a higher intensity of auditory hallucinations was associated with decreasing HRV in the five minutes leading up to the respective assessment (Kimhy et al., 2017).Another EMA study showed that EDA in the 10 min prior to and after a momentary symptom assessment can be used to distinguish between distressing vs. non-distressing hallucinations and delusions (Cella et al., 2019).Furthermore, an investigation of HRV and EDA changes prior to, at the moment, and after momentary psychotic experiences in the daily lives of people with attenuated positive symptoms showed that even in people who do not meet the full criteria for psychosis, the onset of paranoid beliefs was preceded by decreases in HRV and co-varied with decreased HRV and increased EDA (Schlier et al., 2019).Taken together, this indicates that changes in autonomic arousal not only accompany psychotic experiences but also precede -and possibly trigger -their onset.
Given the evidence for temporal associations between HRV, EDA, and subsequent psychotic experiences, the next question is whether these parameters can predict psychotic experiences with sufficient accuracy for implementation in JITAIs.To maximize models' predictive accuracy we need to select statistical methods that can optimally account for the relationship between these predictors and psychotic states.Previous literature was mainly interested in scientific explanations rather than precise predictions.So, it has not yet been systematically tested whether different types of relationships (e.g., linear, non-linear) yield higher model performance.Hence, the kind of relationship and complex interactions between EDA, HRV, and psychotic experiences that would maximize predictive accuracy is unknown.Since the occurrence of psychotic experiences may be built upon complex associations between passively collected variables (e.g., changes in autonomic arousal that need to be harmonized with current physical activity), we need to rely on algorithms that can account for and identify these complex associations on their own.One way to do so is by using machine learning (ML) algorithms such as random forests that automatically model complex interactions and non-linear relationships (Cutler et al., 2012;Hastie et al., 2009).
Building on the established relationships between changes in autonomic arousal and psychotic symptom onset, we aimed to investigate how accurately ML models can predict the onset of hallucination spectrum experiences (HSE) and paranoia in the daily lives of people with an increased likelihood of experiencing psychotic symptoms in a proof-ofconcept study.Reanalyzing data from an ambulatory assessment study (Krkovic et al., 2018;Schlier et al., 2019) we first assessed the predictability of HSEs and paranoia experiences using models that relied solely on autonomic arousal, namely HRV, EDA, heart rate, and physical activity.Further, we investigated whether providing the ML models with additional information about people's lifetime experiences of psychotic symptoms would yield incremental value for prediction.All calculated ML models were tested for generalizability.This was done by comparing ML models trained with data that excluded vs. included data from the person for whom symptoms were to be predicted.

Procedure
Potential participants were prescreened for the lifetime prevalence of psychotic experiences.Participants who reported attenuated psychotic experiences (see section Screening) were invited to our laboratory to conduct an extended baseline assessment (see Krkovic et al., 2018;Schlier et al., 2019).Next, every participant was provided with a Motorola Moto G smartphone from our lab for the duration of their participation, which solely allowed for the use of the movisens XS EMA application.Additionally, participants wore an EDA sensor at the inner wrist of their non-dominant arm and an electrocardiogram sensor on their chest.Participants were instructed to follow their usual daily activities during the subsequent 24-h EMA period.However, they were advised not to shower or engage in demanding physical activities to avoid erroneous or interrupted sensor readings.Between 9 am and 10 pm, participants were prompted to fill out brief EMA questionnaires in 20-min intervals (± 60 s random variation).Optimally, this resulted in 38 EMA prompts over 24 h.
The study was approved by the local ethics committee.All participants provided informed consent prior to participation.Participants were compensated with either a monetary reward or partial course credit.

Screening
We screened participants for attenuated psychotic symptom levels using the German version of the Community Assessment of Psychic Experiences (CAPE; Stefanis et al., 2002).The CAPE consists of 42 items assessing lifetime experiences of psychotic experiences in the general population on four-point Likert scales ranging from 0 = never to 3 = nearly always.We used the positive symptom frequency scale of the CAPE, composed of 20 items (e.g., "Do you ever feel as if some people are not what they seem to be?").The German version has shown good validity and reliability (Schlier et al., 2015).Participants were invited to the study if they self-reported symptoms at or above the median of the sample of the German validation study (Med = 8.00; Schlier et al., 2015) in order to limit the sample to participants that are likely to have at least some degree of psychotic experiences during the 24-hour EMA interval.

Assessment of psychotic experiences
To assess HSEs, we used a shortened version of the Continuum of Auditory Hallucinations -State Assessment (CAHSA; Schlier et al., 2017) that included one item each for the four dimensions vivid daydreaming, intrusive thoughts, perceptual sensitivity, and auditory hallucinations (see Schlier et al., 2019, for the wording of items).On 11point Likert scales (0 = not at all to 10 = very much), participants stated to what extent these items applied to them during the past 20 min.A mean score of HSE was calculated.To predict the occurrence of HSEs we needed to distinguish assessments without HSE present from assessments where participants experienced some degree of HSE.To assess whether any degree of HSE was present, we calculated a reliable change index based on the within-subject Cronbach's alpha and the variance from this sample.All scores reliably different from the lower boundary of the mean score (0) were then marked as HSE present.Based on the calculations of the 95 % reliable change, we labeled assessments with CAHSA scores ≥ 2.33 as events in which HSEs were present and scores < 2.33 as events without HSEs (Schlier et al., 2019).
Paranoia was assessed with the three-item state version (Schlier et al., 2016) of the Paranoia Checklist (Freeman et al., 2005).For the items (1) "I need to be on my guard against others", (2) "People are trying to make me upset", and (3) "Strangers and friends look at me critically" participants stated to what extent these statements applied to them at the current moment on 11-point Likert scales (0 = not at all to 10 = very much).The scale showed an acceptable internal consistency (α = 0.74; Schlier et al., 2016).A mean score was calculated and then used to dichotomize assessments into those without and with paranoia present using the reliable change index.Assessments with paranoia scores ≥ 1.51 were labeled as events with paranoia present and < 1.51 as events without paranoia (Schlier et al., 2019).

Assessment of physiological predictors
We used the movisens ecgMove ambulatory electrocardiogram to measure heart rate and HRV.HRV measures included a) the HRV based on the high frequency band (0.15-0.40 Hz), b) as well as the low frequency band (0.04-0.15 Hz), c) the root mean square of successive differences between heartbeat intervals (RMSSD), and d) the standard deviation of the heartbeat intervals (SDNN).Additionally, the movisens ecgMove tracked physical activity by measuring the number of steps and movement acceleration (based on Van Someren et al., 1996).
EDA was measured with an ambulatory EDA sensor, the movisens edaMove.Specifically, we calculated a) the average skin conductance levels, b) the average amplitude of skin conductance responses, c) the number of skin conductance responses, d) the average arousal, e) the average half recovery times, f) and the average rise times of skin conductance responses.
We calculated all the above-described measures in one-minute intervals and automatically corrected data for potential artifacts with the movisens DataAnalyzer.

Sample
For participants to be included in the study they needed to achieve a CAPE positive symptom scale sum score ≥ 8 in the prescreening and be proficient in German.Exclusion criterion was a diagnosis of a schizophrenia-spectrum disorder in the past.Of the 292 persons entering the prescreening, we recruited 67 participants who achieved a CAPE positive symptom scale sum score ≥ 8. Individuals who completed less than half of the targeted 38 assessments (n = 5) were excluded from the analysis, resulting in a final sample of 62 participants.Participants were predominantly female (74.19 %), between 16 and 39 years old (M = 22.94; SD = 4.85), students (82.26 %), and of German nationality (90.32 %).

Data preprocessing
All analyses were carried out in R 4.2.1 and Python 3.8.6.The python package scikit-learn 1.1.2(Pedregosa et al., 2011) was used for the ML procedures.
First, we removed incomplete symptom assessments.We utilized the one-minute intervals of EDA, HRV, heart rate, and movement from the 20-min period prior to each assessment.Heart rate intervals were labeled as missing if they exceeded the maximal age-predicted heart rate for the respective participant according to the formula 208-0.7*age(Tanaka et al., 2001).Missing data were present for heart rate, HRV variables, EDA variables, as well as movement, possibly due to random erroneous sensor readings or erroneous sensor readings due to strong movements.The final dataset contained 2030 assessments of which 849 contained missing data for at least one physiological parameter in at least one one-minute interval (see supplement Table S1 for an overview of missing data per variable and Table S2 for a complete list of the variables used in this study).We imputed missing data via k-NN Mean single imputation, in which the non-weighted mean of the three nearest neighbors replaced the missing data.
Since the outcome data were heavily skewed towards assessments classified as events without HSEs (72.96 %) and events without paranoia (79.70 %), we used random undersampling to balance out classes for hyperparameter tuning.We randomly removed assessments from the majority class (no symptom experience) until classes were balanced equally.Therefore, the data used for hyperparameter tuning consisted of equal class balances between no symptom experiences and symptom experiences.

Model calculation
For HSE and paranoia, hyperparameter tuning was conducted to estimate the best model configuration for (1) models containing only physiological data from EMA and (2) models containing physiological data and the baseline score of lifetime psychotic experiences (CAPE).All models were calculated with and without the participant ID to explore whether adding implicit information about the data's multilevel structure to the prediction algorithm affected prediction accuracy.
We used random forests to calculate all prediction models.Hyperparameter tuning was conducted using nested-cross validation.First, we split the data into eight outer folds each consisting of a training and testing dataset.Subsequently, the training data for each outer fold was divided into five inner folds, likewise consisting of a training and validation dataset.These inner folds were used for hyperparameter tuning.The best hyperparameter configuration found in the inner folds was applied to the respective outer fold to evaluate the model's performance.During the hyperparameter tuning we varied the maximum depth of decision trees (2-20 in steps of two), the number of decision trees built (100, 250, 500), maximum features allowed to be used per split (20 %, 40 %, 60 %, 80 %), and the minimum samples required at each leaf node (2-20 in steps of two).Further, we pruned decision trees after calculation by varying the degree of model complexity penalization using costcomplexity pruning (α = 0.000, 0.005, 0.010).Each decision tree was calculated using a bootstrap sample.We tuned hyperparameters by testing all possible configurations over our hyperparameter space and optimized for the metric accuracy of models.This configuration of the hyperparameter space is inspired by a simulation study (Mantovani et al., 2019) and the documentation for the package scikit-learn (Pedregosa et al., 2011).
As each outer fold performed its own hyperparameter tuning in its inner folds, the best hyperparameter configuration varied between outer folds.For further analyses, we used the hyperparameter configuration from the model with the best outer fold performance.If model configurations yielded equal accuracy between multiple outer folds, we chose the model configuration with the lowest number of decision trees that needed to be calculated to optimize for prediction speed.

Model validation
To estimate prediction accuracy for the best model configuration, we conducted two performance evaluations for each model.First, we tested model performance using leave-one-assessment-out cross-validation (LOOCV) in which the model was built iteratively on the whole data except for one assessment which was to be predicted.This procedure was repeated until each datapoint was used once.Additionally, we tested model performance using leave-one-person-out cross-validation (LOPOCV) in which the model was built on the whole data excluding the data of one person.Again, each person was excluded once.Although the entire dataset was used once for testing, the training data used for model calculation were always class-balanced with the random undersampling procedure described above.Our primary outcomes to evaluate the predictive accuracy of the models were the sensitivity (i.e., the rate of identified symptom experiences relative to all symptom experiences), the balanced accuracy (BAC, i.e., sensitivity and specificity divided by two), and the positive predictive value (PPV, i.e., the rate of correctly identified symptom experiences relative to all correctly identified and falsely identified symptom experiences).

Descriptive analysis
Participants completely filled out an average of M = 32.74EMA prompts (SD = 4.81, range: 20-40).The overall sample showed an EMA compliance rate of 86.08 %.

Prediction of psychotic experiences
Table 1 shows the hyperparameter tuning results.Results from the LOOCV and LOPOCV are summarized in Table 2.
As can be seen, all prediction models that implicitly included participant ID achieved higher prediction accuracy.For the HSE prediction model based on physiological data and participant ID, we found an 80 % sensitivity to detect HSEs and a BAC of 80 %.The prediction model using physiological data, participant ID, and the CAPE scores yielded incremental gains (sensitivity = 81 %, BAC = 82 %).For paranoia, the prediction model using physiological data and participant ID yielded a 66 % sensitivity and a BAC of 66 %, whereas the prediction model including the CAPE scores showed higher accuracy (sensitivity = 72 %, BAC = 70 %).The BAC of all HSE and paranoia prediction models dropped to chance levels in the LOPOCV (51 %-58 %).
Finally, all HSE and paranoia prediction models yielded high negative predictive values (NPV, 83 %-92 %) indicating that most instances identified as no symptom experiences from our prediction models were correctly classified.The high NPV scores contrast with lower PPV scores, implying that only about one-third of all instances labeled as paranoia experience (29 %-36 %) were correctly classified as such, whereas up to two-thirds of HSE classifications were correct (42 %-64 %).

Stability analyses
To control whether imputation inflated prediction accuracy, we removed all assessments from the dataset that contained at least one missing value (849 assessments) and recalculated the best prediction model (Physiology+CAPE+ID) for HSE and paranoia.Balanced accuracy scores decreased by 1 % for the HSE model (BAC = 81 %) and 5 % for the paranoia model (BAC = 65 %).Table S3 depicts the hyperparameter tuning results and Table S4 shows the LOOCV results for these analyses.
Further, we assessed whether prediction accuracy was inflated due to the amount of data that was imputed on the level of individual assessments.A point-biserial correlation between the two variables (A) number of missing variables (that were subsequently imputed) per assessment and (B) the dichotomous variable incorrect (=0) vs. correct (=1) prediction of this assessment during LOOCV was non-significant, r (2028) = − 0.02, p = 0.341, for the HSE model (Physiology+CAPE+ID).In contrast, the paranoia model (Physiology+CAPE+ID) yielded a significant, albeit minimal to small correlation of r(2028) = − 0.08, p < 0.001.This correlation was negative, indicating that prediction accuracy was reduced when more missing values needed to be imputed.Corresponding multilevel-regression models (assessments nested in participants) testing the association between number of missing data (dependent variable) and accuracy of prediction were non-significant (HSE: b = 3.74, t(2009) = 1.56, p = 0.119; paranoia: b = 1.03, t (1985) = 0.58, p = 0.562).

Exploratory analyses
Since Schlier et al. (2019) reported changes in autonomic arousal prior to and subsequent to symptom experiences, we explored whether the high false positive rate across all models can be explained by false classification of the assessments before and after symptom experiences.To test this, we relabeled all assessments before and after symptom experiences as symptom experiences, too, and re-evaluated all models using LOOCV and the hyperparameter configurations of our main analyses.In these analyses, PPV increased from 58 % to 65 % for the HSE prediction model using physiological data and participant ID and from 64 % to 74 % for the HSE prediction model additionally using CAPE scores.Similarly, the PPV of the paranoia prediction models increased from 32 % to 46 % for the prediction model using physiological data and participant ID and from 36 % to 52 % for the prediction additionally using CAPE scores.Results from all exploratory prediction models are summarized in Table 3.

Discussion
This study showed that ML algorithms using physiological data indicative of autonomic arousal can predict HSEs and paranoia in a population sample with attenuated psychotic symptoms with promising accuracy.Overall prediction accuracy for psychotic experiences ranged between 63 %-82 %.Including additional information in the form of self-reported lifetime frequency of psychotic experiences incrementally  increased model accuracy.Overall, our prediction models in the LOOCV calculations were better able to accurately classify HSE and non-HSE episodes (BAC = 65 %-82 %) than to classify paranoia and nonparanoia episodes (BAC = 63 %-70 %).Furthermore, there is a higher stability of prediction accuracy for HSEs, as removing assessments with missing values reduced balanced accuracy only by 1 % for HSEs but by 5 % for paranoia.Overall, hallucinatory experiences might be predicted more accurately than delusions.
Although accuracy was high for the prediction models, the one metric with overall low performance in all prediction models was the PPV (29 %-64 %), meaning that assessments classified as symptom experiences by the prediction models were often false positives, especially for paranoia.If the algorithm were implemented to predict the emergence of symptom episodes, it would overestimate the number of state symptom episodes in total.Depending on the ultimate goal of the prediction algorithm, this could lead to for example excessive costs (e.g., when the prediction algorithm is used to allocate expensive interventions to people and a high amount of the people selected do not need them) or, may even increase unfavorable outcomes (e.g., by exposing people who do not need any treatment for psychotic symptoms to an intervention with negative side effects).Bearing in mind the use of these prediction algorithms for Ecological Momentary Interventions (EMI) or JITAIs, high false positive rates bear, at the very least, a substantial risk of annoying users, which may lead to disengagement when intervention prompts occur too often or at the wrong moments.
However, our exploratory follow-up analyses indicated that false positives seem to cluster immediately prior to and after true symptom experiences.Thus, applying our predictive models in EMIs/JITAIs may still be feasible, provided the addition of a secondary algorithm to account for this temporal imprecision (e.g., an inhibitor algorithm to avoid repeated intervention prompts in short succession).However, additional sources of false positive classification also require attention.Some false positives could result from a lack of specificity for the type of symptom.For example, changes in autonomic arousal that falsely predicted HSEs or paranoia might have detected a closely related psychological symptom, such as general subjective stress.Such an assumption would be in line with the fact that (1) lower resting HRV of people at risk for psychosis has been linked to higher subjective stress levels in daily life (Bahlinger et al., 2020) and that (2) several EMA studies showed that HRV levels also accurately predict subjective stress (accuracy = 79 %-82 %; Kawada et al., 2021;Li et al., 2016).Prior EMA studies that investigated temporal associations between autonomic arousal and psychotic symptoms did not control for the presence of other psychological states such as negative affect, anxiety, or stress (Cella et al., 2019;Kimhy et al., 2017;Schlier et al., 2019).Thus, there is only limited prior data available as to how much psychotic symptoms and these states overlap.There is the possibility that the changes in autonomic arousal that covaried with psychotic symptoms could be attributed to other psychological states as well.Future studies need to include an assessment of a broader range of other symptoms to explore whether predictive models can be trained to differentiate between symptom domains (e.g., affective versus psychotic symptoms).If prediction models are able to detect a multitude of symptoms but face difficulties in distinguishing between them, ambulatory interventions that utilize predictive algorithms might need to provide treatment options and allow for a selfguided decision of the most appropriate intervention with each prompt.
Although accuracy was high as long as model training included partial data from the person for which the prediction was evaluated (LOOCV), the accuracy dropped close to chance levels when applied to completely new participants (LOPOCV).Apparently, our ML models needed an individual's data to achieve sufficient accuracy.There are two possible explanations for this: Firstly, our sample consisted of 62 participants.Thus, it is unlikely that we assessed a representative range of the population with attenuated levels of psychotic symptoms and covered broad enough ranges of EDA, heart rate, and HRV in this population.If this explanation holds true, the low performance of our LOPOCV models would be remedied by increasing the sample size and diversity of the training dataset, eventually reaching generalizability to new cases as training data increases.Another explanation is that there might be too much heterogeneity in the subjective experience of symptoms and their characteristics.If this is the case, models built on data from the exact individual whose symptom episodes one wants to   predict would be needed to achieve higher accuracy.Future research needs to test whether a sufficiently large dataset could provide the basis for a general prediction model that can be readily used with any new participant.Alternatively (and possibly more cost-effectively), practical applications might benefit from using a hybrid prediction model that starts with a global base model and adds a brief assessment period for each new user to individualize the model and optimize prediction accuracy.
A challenge was to deal with the inherent skewness of our dataset, primarily due to the limited occurrence of psychotic experiences within this sample.However, this challenge may not be unique to population samples with attenuated psychotic symptoms.Some degree of imbalance may be inherent to the task of developing prediction models for psychotic symptoms.In clinical samples, however, skewness is likely reversed, with an overrepresentation of symptom experiences; for example up to 48 % of voice-hearers report to experience their voices almost constantly (McCarthy-Jones et al., 2014).Moreover, even if a highly accurate prediction is achieved with this imbalanced data this group of patients would end up being constantly prompted with interventions when a mere symptom-presence-based prediction model is used in subsequent JITAIs.This could turn a JITAI into a disruption rather than a helpful tool optimally tailored to the patient's needs.Consequently, prediction outcomes in some patients may need to be switched to a less frequent type of event that is also characterized by an especially high need for help (e.g., highly distressing hallucinations).

Limitations
Some limitations need to be considered: Firstly, due to technical limitations, specifically the ambulatory sensors' battery life, this study was limited to a 24-h assessment period.To maximize assessments per person, we assessed psychotic experiences in short 20-min successions.As already pointed out in a previous study from this project, assessing psychotic experiences only for 24 h might not depict the person's full range of psychotic experiences in daily life (Schlier et al., 2019).
Secondly, the current study investigated subclinical psychotic experiences and it remains to be seen whether the prediction of psychotic symptoms can be achieved in clinical populations as well.
Since we provided participants with smartphones to use for the EMA, we might have altered the participants' daily routines (van Berkel et al., 2017).Whereas the passive sensing central to the prediction algorithm operated on independent devices, the self-reported symptom assessments could have been affected.A recent study by Markowski et al. (2021) showed that using personal devices for EMA decreases the likelihood of missing prompts due to the device being turned off when compared to smartphones provided for an assessment.These suboptimal conditions for EMA compliance (and consequently, ecological validity) are due to the fact that the collection of the data happened before the recommendations for personal phone usage were established.However, future studies that develop prediction models using EMA should prioritize the use of personal devices for data collection to increase compliance and optimize outcome assessment for the ecological validity of results.
Additionally, one of the issues with using autonomic arousal for the prediction of psychotic symptoms is that changes in autonomic arousal may be attributed to other factors as well, such as consumption of caffeine (Barry et al., 2005(Barry et al., , 2008) ) or physical exercise (Boettger et al., 2010).Our analyses solely controlled for exercise by including physical activity metrics among the sensor variables.Controlling for other causes of changes in arousal within the mobile sensing environment is less feasible.Nevertheless, it might be possible to increase accuracy and particularly the PPV further by assessing these alternative causes in the EMA prompts of future studies, then implement them into the model training as additional outcome categories, and thus optimize the ML algorithm for the distinction between psychotic symptom episodes and other causes for changes in arousal.

Future directions
Demonstrating that it is possible to predict psychotic symptom onset by using ML is promising for the field of ambulatory interventions as this advances the field towards a JITAI that dynamically reacts to impending symptom onset.
Since prediction models that included information about people's lifetime experiences of psychotic symptoms better predicted symptom experiences, accuracy may be increased further by integrating more general health-related baseline information into the ML models (e.g., information on baseline-HRV, clinical interviews, emotion regulation abilities, and other current symptom levels) to better match the targeted individual to other people with similar arousal-symptom associations.
At the same time, the reliance on training data from the respective targeted individual may point towards the inferiority of ML models that integrate individual data from the targeted person with a general training dataset.Possibly, completely individualized, ideographic prediction models trained solely on the data from the targeted individual perform even more accurately by better accounting for ideographic associations between predictor variables and psychotic symptom experiences.Future studies need to explore this using sufficiently large datasets per participant to build completely individualized prediction models for each person.
To conclude, if these promising results regarding the accurate prediction of psychotic symptoms can be replicated or even improved in patient samples, the idea of fully automated just-in-time mobile interventions could soon become a reality.

Financial support
This research received no funding/grant from any funding agency, commercial or not-for-profit sectors.

Note.
Results are displayed in relative frequencies.CAPE = Community Assessment of Psychic Experiences; Physiology = model based on only physiological data; Physiology+ID = model based on physiological data and participant ID; Physiology+CAPE = model based on physiological data and CAPE positive symptom scale scores; Physiology+CAPE+ID = model based on physiological data, CAPE positive symptom scale scores, and participant ID; Sens = sensitivity, PPV = positive predictive value; Spec = specificity; NPV = negative predictive value; BAC = balanced accuracy.F.Strakeljahn et al.

Table 1
Hyperparameter tuning results and final hyperparameter configuration for each machine learning model.
Note. α = alpha value for cost-complexity pruning; CAPE = Community Assessment of Psychic Experiences; Physiology = model based on only physiological data; Physiology+ID = model based on physiological data and participant ID; Physiology+CAPE = model based on physiological data and CAPE positive symptom scale scores; Physiology+CAPE+ID = model based on physiological data, CAPE positive symptom scale scores, and participant ID.

Table 2
Cross-validation results of the calculated machine learning models to predict HSEs and paranoia.Results are displayed in relative frequencies.CAPE = Community Assessment of Psychic Experiences; Physiology = model based on only physiological data; Physiology+ID = model based on physiological data and participant ID; Physiology+CAPE = model based on physiological data and CAPE positive symptom scale scores; Physiology+CAPE+ID = model based on physiological data, CAPE positive symptom scale scores, and participant ID; PPV = positive predictive value; NPV = negative predictive value; BAC = balanced accuracy.

Table 3
Results of LOOCV machine learning models that treated assessments prior and post symptom onset as symptom experiences as well.