Feasibility of using a smartphone app to assess early signs, basic symptoms and psychotic symptoms over six months: A preliminary report.

BACKGROUND
Psychosis relapses are common, have profound adverse consequences for patients, and are costly to health services. 'Early signs' have been used to predict relapse, in the hope of prevention or mitigation, with moderate sensitivity and specificity. We investigated the feasibility and validity of adding 'basic symptoms' to conventional early signs and monitoring these using a smartphone app.


METHODS
Individuals (n = 18) experiencing a relapse within the past year were asked to use a smartphone app ('ExPRESS') weekly for six months to report early signs, basic symptoms and psychotic symptoms. Above-threshold increases in app-reported psychotic symptoms prompted a telephone interview (PANSS positive items) to assess relapse.


RESULTS
Participants completed 65% app assessments and 58% telephone interviews. App items showed high concurrent validity with researcher-rated psychotic symptoms and basic symptoms over six months. There was excellent agreement between telephone call and face-to-face assessed psychotic symptoms. The primary relapse definition, based on telephone assessment and casenotes, compared well with a casenote-only definition but had better specificity. Mixed-effects models provided preliminary evidence of concurrent and predictive validity: early signs and basic symptoms were associated with most app-assessed psychotic symptom variables the same week and with a number of psychotic symptoms variables three weeks later; adding basic symptoms to early signs improved model fit in most of these cases.


CONCLUSIONS
This is the first study to test a smartphone app for monitoring early signs and basic symptoms as putative relapse predictors. It demonstrates that weekly app-based monitoring is feasible, valid and acceptable over six months.


Introduction
Psychosis relapses are associated with worse outcomes by almost every measure (Almond et al., 2004;Andrew et al., 2012;Appleby, 1992;Birchwood et al., 2000;Gumley and Schwannauer, 2006;Iqbal et al., 2000;Maclean, 2008;Wiersma et al., 1998;Wu et al., 2005). Relapse signatures, idiosyncratic combinations of warning signs, have been used to predict relapse in the hope of prevention or mitigation but have only moderate sensitivity and specificity (Eisner et al., 2013). To improve predictive power, we investigated adding basic symptoms (Schultze-Lutter et al., 2007a) to pre-existing putative predictors (conventional early signs) (Birchwood et al., 1989) and using a smartphone application ('app') to facilitate prompt identification of these.
Incorporating both basic symptoms and conventional early signs of relapse into personalized relapse signatures will likely achieve better relapse prediction than has previously been demonstrated (Eisner et al., 2013). Basic symptoms are subtle, subjective changes in individuals' experiences of themselves (e.g. mild cognitive problems) and the world around them (e.g. more vivid colors) which predict first episodes of psychosis (Fusar-Poli et al., 2012;Schultze-Lutter et al., 2007b). There is preliminary evidence that basic symptoms also predict relapses of psychosis (Bechdolf et al., 2002;Eisner et al., 2018;Gaebel and Riesbeck, 2014) but there are no comprehensive, prospective studies examining this. A well-powered, methodologically sound, prospective study to establish whether basic symptoms predict relapses of psychosis is needed. We tested the feasibility of carrying out such a study using a smartphone app, ExPRESS (Experiences of Psychosis Relapse: Early Subjective Signs) (Eisner et al., 2019), which collects weekly assessments of early signs, basic symptoms and psychotic symptoms.
A number of symptom monitoring apps prospectively assessing symptom course have been tested (Ainsworth et al., 2013;Barnett et al., 2018;Ben-Zeev et al., 2014;Ben-Zeev et al., 2017;Ben-Zeev et al., 2018;Bucci et al., 2018;Kumar et al., 2018;Meyer et al., 2018;Niendam et al., 2018;Palmier-Claus et al., 2012). Most monitor individuals' current mental state rather than aiming to elicit symptoms predictive of relapse. Two assessed symptoms overlapping somewhat with conventional early signs (e.g. anxiety, confusion), but did not measure relapse Niendam et al., 2018). A small pilot study (Barnett et al., 2018) collected both self-reported early signs via an app and relapse as an outcome, but the focus of the published paper was mainly on the predictive value of passively collected data, with limited details of early signs data reported. Although a number of studies have reported good correspondence between passively collected data and outcomes (Barnett et al., 2018;Meyer et al., 2018;Wang et al., 2016;Wang et al., 2017), this was not the focus of the current study. Instead, we aimed to further refine the predictive value of app-based monitoring by adding basic symptoms to conventional early signs as putative relapse predictors.
The current study tested the methodology for a planned large-scale study examining whether basic symptoms are valuable relapse predictors. There were four specific aims: i) to explore the feasibility of using a smartphone app (ExPRESS) for weekly monitoring of early signs, basic symptoms, psychotic symptoms and relapse over six months; ii) to assess the concurrent and preliminary predictive validity of using personalized relapse signatures integrating basic symptoms and conventional early signs; iii) to examine the validity of an operational definition of relapse using a combination of smartphone app assessment, verbal telephone assessment and casenote examination; iv) to examine the acceptability of the study procedures.

Study design
This study consisted of three phases. First, cross-sectional assessments characterized the sample and checked eligibility for the next phase. Second, eligible participants used ExPRESS for six months and received telephone calls from the researcher (prospective, longitudinal phase). Finally, after six months the acceptability of the study procedures was explored using qualitative interviews. The study was carried out in accordance with the Declaration of Helsinki (World Medical Association, 2013), ethical approval was obtained from Greater Manchester West Research Ethics Committee (14/NW/1471) and the study was registered (ClinicalTrials. gov: NCT03558529).

Participants
Participants were recruited from three Mental Health Trusts in North-West England between June 2015 and June 2016. Inclusion criteria were: schizophrenia spectrum diagnosis (Diagnostic and Statistical Manual of Mental Disorders, 4th Edition; DSM-IV) (Sheehan et al., 1998); ≥1 acute psychotic episode in the past year (admission to crisis team or hospital; or exacerbation of psychotic symptoms lasting ≥2 weeks and leading to a change in management), or ≥2 episodes in the past 2 years, including index episode; currently prescribed antipsychotic medication; age over 18 years; fixed abode; fluent in English; sufficiently stable to take part (able to complete screening assessment); no current alcohol or drug dependence (Structured Clinical Interview for DSM-IV) (First et al., 1996); informed consent. To progress beyond baseline assessment, individuals must have reported basic symptoms which began or increased prior to a recent episode of psychosis.

ExPRESS app
App design is detailed elsewhere (Eisner et al., 2019), with screenshots provided in Supplementary Fig. 1. Briefly, ExPRESS is an android smartphone app which prompts participants once a week to answer a personalized set of questions regarding psychotic symptoms (PANSS positive items) (Ainsworth et al., 2013;Kay et al., 1987;Palmier-Claus et al., 2012), mood symptoms (Calgary Depression Scale) (Addington et al., 1993), basic symptoms (Basic Symptoms Checklist, BSC) (Eisner et al., 2019) and early signs of relapse (Early Signs Scale, ESS) (Birchwood et al., 1989) within the past week. Participants have a 24-h window each week to respond to the question set. Responses are uploaded automatically to a secure server, accessible to the research team via a password protected web interface. Weekly self-reports were deemed sufficiently frequent to allow early signs to be meaningfully detected but not so frequent as to overburden participants, since beta-test participants considered weekly app use acceptable but would not have wanted it to be more frequent (Eisner et al., 2019).

Baseline assessments
An overview of all baseline and follow-up assessments is provided in Fig. 1. At baseline, the Schizophrenia Proneness Instrument Adult Version interview (SPI-A) (Schultze-Lutter et al., 2007a) identified whether participants had experienced basic symptoms beginning or increasing prior to their most recent psychotic episode. Early signs (ESS) (Birchwood et al., 1989) were assessed for the same period.
Psychotic symptoms and mood symptoms during the previous week were assessed using the PANSS (Kay et al., 1987), PSYRATS (Psychotic Symptom Rating Scales) (Haddock et al., 1999) and Hospital Anxiety and Depression Scale (Zigmond and Snaith, 1983). Substance use was assessed using a 4-point scale (Tarrier et al., 2006), medication adherence using a 7-point scale (Kemp et al., 1996) and cognitions related to imminent psychosis relapse using the Fear of Recurrence Scale (FoRSE) (Gumley et al., 2015). Demographic information was gathered using a standard questionnaire. Assessors were trained and supervised by senior colleagues. Mean intra-class correlations (ICC) compared to gold standard PANSS ratings were excellent (positive = 0.92; negative = 0.83; general = 0.89; total = 0.92).

Six month longitudinal app-use phase
Using SPI-A and ESS assessments, the researcher and participant defined a 'relapse signature' combining early signs and basic symptoms. Items from the participant's relapse signature were entered into the Ex-PRESS app so that individuals could monitor a personalized set of early signs. Participants were trained on ExPRESS and asked to use it weekly for 6 months or until relapse, whichever was sooner. Those owning an android smartphone used their own phones and the remaining participants used a study phone. The latter received weekly text messages to their own phone reminding them to use ExPRESS.
Participants were telephoned by the researcher (weekly for four weeks; monthly thereafter) to encourage participation and troubleshoot any difficulties with app use. During the 3-month telephone call, the researcher assessed the PSYRATS, five PANSS positive items (delusions, hallucinations, suspiciousness, grandiosity, conceptual disorganization), and a subset of SPI-A items (Cognitive Disturbances (COGDIS) and Cognitive-Perceptive (COPER)) (Schultze-Lutter et al., 2007a). This was to screen for additional delusions or overlooked relapses and to check app item validity.
The primary relapse definition (Wunderink et al., 2007) (Supplementary Fig. 2) required a symptom increase for ≥1 week resulting in a management change (casenote-reported medication change or increased observation by the clinical team, including admission). Symptom increase criteria, assessed via PANSS telephone interview (five items: delusions, hallucinations, suspiciousness, grandiosity, conceptual disorganization), were: for remitted individuals, an increase to ≥4 or an increase of ≥2 points (whichever was higher) on any item; for nonremitted individuals with all baseline PANSS positive items b5, at least one item ≥5; for non-remitted individuals with ≥1 baseline item ≥5, an increase to ≥4 or an increase of ≥1 point (whichever was higher) on any item. Individuals' remission status was initially determined from baseline PANSS using standard remission severity criteria (Andreasen et al., 2005). To identify participants remitting during the app-use phase, we modified these criteria to accommodate app content: participants scoring ≤3 for two consecutive weeks on all app-assessed psychotic symptom items (delusions, hallucinations, grandiosity, suspiciousness; scaled the same as corresponding PANSS items) were classified as remitted.
PANSS telephone interviews were triggered when app-reported psychotic symptoms exceeded a pre-specified threshold (initially equivalent to the symptom increase criteria). If symptom increase criteria were met, the telephone assessment was repeated one week later to check duration. If app-reported symptoms exceeded the threshold two consecutive weeks without any increase in symptoms detected during the resultant telephone call, the threshold for receiving a call was recalibrated (for each app-assessed symptom, the new threshold was one point above the average of the previous four app responses).
Two aspects of the above relapse assessment procedure were added seven months into the study. Firstly, the original definition did not account for remission status, with identical symptom increase criteria applied to non-remitted and remitted individuals. Secondly, the telephone call threshold recalibration was formerly absent, so participants with high residual symptoms were telephoned every week. Following these changes, four were deemed remitted at baseline, three as remitted during the app-use phase, four had their telephone call threshold recalibrated and five were unaffected.
The secondary relapse definition, assessed using casenotes alone, required a symptom increase for ≥1 week resulting in a management change. For each participant, the relapse start date was recorded, with verbatim extracts from casenotes describing changes in symptoms and management. The researcher conducting casenote screening was trained to protocol (Barrowclough et al., 2010); ratings on reliability cases showed perfect correspondence with gold standard assessors (relapse presence/absence kappa = 1.00; relapse start date ICC = 1.00).

Qualitative interviews and final assessments
On longitudinal phase completion or dropout, face-to-face assessments (SPI-A; PANSS positive items) and qualitative interviews were conducted. The brief (average 6 min), audio-recorded qualitative interview topic guide included: reasons for participation; researcher support; telephone call acceptability and financial reimbursement; study highlights and lowlights. PANSS positive items were assessed via telephone within three days prior to face-to-face assessment for validity checks.

Participant payment
Participants received: £10 per completed study phase; £10 monthly to cover phone credit (longitudinal participants); £5 per telephone interview (last 6 months of the study only; to test whether engagement increased).

Statistical analysis
Analyses were conducted in Stata (StataCorp, 2015), with bootstrapping where appropriate, and considered statistically significant at p b 0.05. App-assessed items scored on a 4-point scale were linearly transformed to a 7-point scale to aid comparison with other measures.

ExPRESS weekly monitoring: feasibility
Feasibility was assessed using descriptive statistics summarizing: recruitment to baseline assessments, recruitment to longitudinal phase (proportion of baseline participants eligible and consenting), app engagement (percentage of assessments completed), timing of app responses across the 24-hour response window, proportion of standard study telephone calls and telephone call PANSS assessments completed, and proportion of study phones returned. The 6-month relapse rate was noted, to inform future power calculations. The pattern of app completion during the app-use phase was examined in a mixed-effects model with a random effect of participant and a fixed-effect of time. Effects of baseline variables on percentage app completion were examined using Spearman's correlation (continuous variables), Mann-Whitney or Kruskal-Wallis test (categorical variables).

Relapse signatures (basic symptoms and early signs): validity
To measure concurrent validity, the number of app-reported basic symptoms during the first three and full six months of app use were compared to retrospective telephone (3 months) and face-to-face (6 months) researcher-rated SPI-A assessments using ICCs (two way mixed, absolute agreement, single measures). Preliminary predictive validity was assessed in two ways. Firstly, mixed-effects models were estimated, with app-assessed psychotic symptoms as the dependent variable, a fixed-effect of early signs and/or basic symptoms and a random effect of participant. Likelihood ratio tests explored whether adding basic symptoms to early signs improved model fit. Secondly, patterns of basic symptoms and early signs were examined graphically in the participants meeting full or partial relapse definitions.

Relapse definition: validity
Validity was examined by comparing: 6-month relapse rates using primary and secondary relapse definitions (kappa); telephone interviews of PANSS items and the same items assessed face-to-face (two way mixed, absolute agreement, single measures ICC); researcherrated symptoms and app-reported symptoms. For the latter comparison, the researcher-rated symptom variables came from a face-to-face interview, where available, or telephone interview. Spearman's correlation was calculated to aid comparison with previous studies (Palmier-Claus et al., 2012) but this does not account for the nested data structure. Therefore, mixed-effects models were constructed, with app-reported symptoms as the dependent variable, a fixed-effect of researcher-rated symptoms and a random effect of participant. The fixed-effect coefficient can be interpreted as the average change in app-reported symptoms for a 1-point change in researcher-rated symptoms.

Study procedures: acceptability
We used the framework method (Gale et al., 2013) to analyze verbatim transcriptions of qualitative interviews. The research team developed the initial analytical framework using the topic guide and independently coded two of the first three transcripts each, updating the framework as necessary. The first author systematically coded the remaining transcripts and charted the data into a framework matrix to aid discussion and interpretation by the whole team.

Sample characteristics
Demographic and clinical characteristics of longitudinal participants are shown in Table 1. The main reasons for exclusion were that individuals were ineligible, declined to participate, or were not given study information by staff. Most baseline participants (18/22) were eligible for and consented to the longitudinal phase and engagement was high, with 78% completing ≥33% app assessments and 72% completing ≥50% app assessments. Most longitudinal participants (89%) completed a qualitative interview. Thirteen participants used ExPRESS on a study phone; all returned the phone in good condition except one who returned it to a clinician, who lost it.

ExPRESS weekly monitoring: feasibility
Participants completed 65% app assessments and 65% supportive telephone calls. App-reported symptoms increased above the telephone interview threshold 31 times, of which 18 (58%) resulted in completed PANSS positive telephone interviews; participants were unavailable for the remaining 13 telephone interviews. The distribution of app completion across the sample is shown in Fig. 3. Three clusters are apparent: half the sample completed ≥90% of app assessments, four completed 60-70% and four completed b13% of assessments. The exception was unexpectedly abroad for several months, precluding app completion; while in the UK, their app completion was 60%. Supplementary Fig. 3 shows app response timing across the 24-hour response window; most responses occurred in the first 12 h, with a substantial peak immediately after the initial alert (1.30 pm). Participants responded to fewer prompts as the study progressed (OR = 0.89 per week follow-up; p b 0.001; Supplementary Fig. 4). Percentage app completion was significantly and inversely correlated with baseline depression and fear of relapse, with baseline anxiety approaching significance and all other baseline variables non-significant (Table 1).

Descriptive statistics
The mean number of basic symptoms reported at baseline as having begun or increased prior to a recent psychotic episode was 6.4 (sd = 4.3, range = 1-17); mean ESS score for this period was 84.4 (sd = 20.6, range = 41-121).
In mixed-effects models (Table 3) both early signs and basic symptoms were significantly associated with all psychotic symptom measures, except grandiosity, at the same time point. Adding basic symptoms to early signs improved model fit in three cases. (Table 3) One week later: early signs significantly predicted suspiciousness; neither early signs nor basic symptoms predicted any other psychotic symptom variables.

Prediction of psychotic symptoms
Two weeks later: basic symptoms approached significance in predicting suspiciousness; no other predictions two weeks later were significant.
Three weeks later: early signs significantly predicted psychotic symptoms and hallucinations; basic symptoms significantly predicted psychotic symptoms and delusions, with hallucinations and suspiciousness approaching significance; adding basic symptoms to early signs improved model fit for psychotic symptoms and delusions, with suspiciousness approaching significance.

Prediction of relapse
Graphs of app-assessed items for participants meeting full or partial relapse definitions ( Supplementary Fig. 5) indicate greater variability over time in severity of app-assessed early signs and basic symptoms than app-or researcher-assessed psychotic symptoms. Since they respond more than psychotic symptoms to relapse triggers and treatment, this probably signifies greater sensitivity to underlying changes rather than more random noise.

Relapse definition: validity
Two participants met both primary and secondary relapse definitions, and one met the secondary definition alone, a substantial agreement (kappa = 0.76, p = 0.003). The relapse which met the secondary but not primary definition appeared to be a false positive (see Supplementary Textbox 1). Agreement between five PANSS positive items assessed during a telephone call and the same items assessed face-to-face was extremely high (Table 2), with ICCs ranging from 0.94 to 0.96 (p b 0.001). High agreement was also demonstrated for PSYRATS delusions (ICC = 0.97, p b 0.001) and hallucinations (ICC = 0.89, p b 0.001) subscales. The correlation between researcher-rated symptoms and app-reported symptoms was high, with Spearman's rho ranging from 0.80 to 0.87 (p b 0.001; Table 2). Fixed-effects coefficients from mixed-effects models ranged from 0.66 to 1.08 (p b 0.05), with all confidence intervals crossing 1.00.

Study procedures: acceptability
We report the acceptability of study participation (Supplementary Table 1). A detailed analysis of ExPRESS's acceptability is reported elsewhere (Eisner et al., 2019). Participants gave various reasons for taking part, including altruism, wanting to increase knowledge about psychosis, curiosity about the research, feeling it might help them personally and previous positive research experiences. One individual, initially attracted by the financial reimbursement, became genuinely interested in the study. Several participants reported liking answering the questions, enjoying their novelty and finding them easy to answer even during difficult weeks. Others enjoyed the normalizing effect of ExPRESS, satisfaction of helping with research and increased understanding of their illness.
All participants found study telephone interviews acceptable and felt they received enough researcher support, with several mentioning that the reminder texts were helpful. Participants commented that the telephone calls were not intrusive and their frequency was appropriate; some found them encouraging and reassuring. Views about whether financial reimbursement for telephone interviews was needed were mixed. Although not specifically asked about the baseline interview, two participants mentioned finding it long and a little stressful initially; another commented that he found it acceptable.

Discussion
This study demonstrates the feasibility and acceptability of using an app for weekly monitoring of early signs, basic symptoms, psychotic symptoms and relapse over a six-month period, alongside telephone calls from a researcher. We show preliminary evidence of concurrent and predictive validity: early signs and basic symptoms were separately associated with most app-assessed psychotic symptom variables the same week and with a number of psychotic symptoms variables three weeks later, and adding basic symptoms to early signs improved model fit in most of these cases. App items showed high concurrent validity with researcher-rated psychotic symptoms and basic symptoms over six months. There was also excellent agreement between telephone call and face-to-face assessed psychotic symptoms. The primary definition of relapse, based on telephone interviews and casenotes, compared well with a casenote-only definition but had better specificity.
Since this study has the longest app-use phase of any to date reporting a symptom monitoring app in a sample with established psychosis, participants' app engagement is of key interest. Participants completed 65% app assessments, with 78% completing ≥33% assessments. Although lower than studies with a follow-up ≤2 months (Ainsworth et al., 2013;Ben-Zeev et al., 2014;Meyer et al., 2018;Palmier-Claus et al., 2012), this compares favorably to those with 3month  or 5-month  follow-up periods. Unlike previous studies (Ainsworth et al., 2013;Ben-Zeev et al., 2014;Bucci et al., 2018;Kumar et al., 2018;Palmier-Claus et al., 2012), we used weekly (rather than daily) app assessments which appear to be better tolerated over longer follow-up periods. However, the only study with a longer follow-up (14 months) (Niendam et al.,  2018) included daily app assessments and averaged 69% app completion in a clinical high risk and recent onset psychosis sample. Participants were paid per completed assessment which likely increased engagement. Their young age may also have engendered higher app completion since they are likely to be more familiar with smartphone apps than our older sample with established psychosis (Bonet et al., 2018). Although neither we, nor others (Meyer et al., 2018;Palmier-Claus et al., 2012), found an effect of age on app completion, our sample was possibly too small or had an insufficient age range to detect this.
Other predictors of app engagement have been examined but a consistent picture is yet to emerge (Killikelly et al., 2017). Studies have reported that higher positive (Meyer et al., 2018;Palmier-Claus et al., 2012), negative Meyer et al., 2018) or agitation/ mania symptoms  predicted lower app engagement, whereas the current study found a significant effect of depression and fear of relapse. Arguably, those who are more fearful of relapse and more depressed are at greater risk of relapse (Conley, 2009;Gumley et al., 2015), more likely to avoid help seeking (Gumley and Park, 2010) and thus more difficult for services to engage in treatment. If symptom monitoring apps become commonplace, clinicians must recognize that these may not suit everyone and take additional steps to engage these individuals. Interestingly, neither we, nor others , found that using a study phone significantly reduced app engagement, despite qualitative feedback suggesting that participants prefer their own phones (Ainsworth et al., 2013;Eisner et al., 2019;Palmier-Claus et al., 2012). Return rate of study phones in the current study was excellent (92%), and comparable to previous studies (Biagianti et al., 2017;Granholm et al., 2012), contrasting with clinicians' fears that patients might lose/sell study phones (Berry et al., 2017).
This study provides a novel approach to operationally defining relapse by using a combination of telephone interviews and casenotes. While numerous relapse definitions exist, these rely on face-to-face assessments or solely on casenotes (Eisner et al., 2013;Falloon et al., 1983;Gleeson et al., 2010;Olivares et al., 2013). We suggest that remote relapse assessment is more easily integrated into participants' lives, decreasing both participant and researcher burden; this is likely to increase engagement and allow more frequent, long-term monitoring. Our operational relapse definition compared well with a casenote- Table 2 Comparison of three modes of assessing psychotic symptoms: face-to-face interview, telephone interview, self-report using app items.  only definition but had better specificity, since the latter generated a false positive. However, conversely, our operational definition may have generated false negatives; since the proportion of unanswered telephone calls was high (42%), some symptom increases may have been missed. As the first study comparing telephone and face-to-face interview PANSS items, our finding of extremely high agreement between these is encouraging but needs replicating in a larger sample. Nevertheless, this finding suggests that our definition may be comparable to other relapse definitions that use PANSS positive items. We found a strong association between app-reported and researcherrated psychotic symptoms, replicating previous findings Palmier-Claus et al., 2012) and suggesting that symptom monitoring apps are a valid means of assessing symptom course. Agreement between app-and researcher-assessed basic symptoms was also excellent over 6 months, improving upon previous self-report measures of basic symptoms which show poor concurrent validity (Mass et al., 1997;Michel et al., 2016). Although we found low agreement at 3-month assessment, key differences between the 3-and 6-month SPI-A assessments may explain this disparity: the 3-month researcher-rated assessment was conducted via telephone rather than face-to-face, with a smaller sample and only included a sub-set of SPI-A items.
There are a number of important limitations. Firstly, the sample was small, precluding a definitive examination of predictive validity: sensitivity and specificity could not be calculated, making it difficult to compare with other early signs studies. Nonetheless, mixed-effects models (using changes in psychotic symptoms as a proxy for relapse) and graphs of relapsers provided sufficient evidence of predictive validity to warrant further investigation. Secondly, the sample is unlikely to be representative, as the decline rate was high (42%), albeit comparable with other symptom monitoring app studies (Ainsworth et al., 2013:~30% decline;Bucci et al., 2018: 30%;Ben-Zeev et al., 2014: 39%;Kumar et al., 2018: 44%;Niendam et al., 2018: 47%;Palmier-Claus et al., 2012:~50%). Thirdly, researchers were not formally blind to participants' early signs or basic symptoms prior to conducting PANSS telephone interviews; in practice neither early signs nor basic symptoms were known but future studies should use an independent, formally blinded assessor. Fourthly, regarding the mixed-effects models, the dependent variable (psychotic symptoms) and predictors (early signs and basic symptoms) are related experiences which were self-reported by participants using the same method (an app), increasing the risk of spurious correlations arising from common method variance and expectancy effects. Fifthly, while baseline assessments were conducted by a single assessor, a second assessor carried out half the 6-month SPI-A assessments but inter-rater reliability was not evaluated. Finally, since this was a feasibility study, protocol changes were made during data collection. However, these changes refined the relapse definition and reduced the number of telephone interviews.
In conclusion, we found that weekly app-based assessment of symptoms was feasible, acceptable and valid over a six month period, offering support for a large-scale study using this methodology. More generally, these findings provide further evidence that symptom monitoring technology could be a valuable addition to routine mental health service delivery, since they closely correspond to researcher-rated assessments and are well-tolerated over an extended period of time.

Conflict of interest
Sandra Bucci is a director of Affigo CIC, a not-for-profit social enterprise company spun out of the University of Manchester in December 2015 to enable access to social enterprise funding and to promote ClinTouch, a symptom-monitoring app, to the NHS and public sector.

Contributors
All authors were involved in the design and ongoing management of the study, and contributed to drafts of this report. EE, the principal investigator, prepared the protocol, was responsible for the day-to-day running of the study, collected, analysed and interpreted the data and took a lead on writing the manuscript. SB and RD provided clinical and research supervision. RE advised on statistical aspects of the study design and analysis. NB provided maternity cover.

Role of the funding source
The Medical Research Council did not play a role in the study design, data collection, data analysis, data interpretation, writing of the report or in the decision to submit the article for publication.