Measurement properties of tools used to assess suicidality in autistic and general population adults : A systematic review

Adults diagnosed with autism are at significantly increased risk of suicidal thoughts, suicidal behaviours and dying by suicide. However, it is unclear whether any validated tools are currently available to effectively assess suicidality in autistic adults in research and clinical practice. This is crucial for understanding and preventing premature death by suicide in this vulnerable group. This two stage systematic review therefore aimed to identify tools used to assess suicidality in autistic and general population adults, evaluate these tools for their appropriateness and measurement properties, and make recommendations for appropriate selection of suicidality assessment tools in research and clinical practice. Three databases were searched (PsycInfo, Medline and Web of Knowledge). Four frequently used suicidality assessment tools were identified, and subsequently rated for quality of the evidence in support of their measurement properties using the COSMIN checklist. Despite studies having explored suicidality in autistic adults, none had utilised a validated tool. Overall, there was lack of evidence in support of suicidality risk assessments successfully predicting future suicide attempts. We recommend adaptations to current suicidality assessment tools and priorities for future research, in order to better conceptualise suicidality and its measurement in autism.

one to tell, I did not consider it important etc.), could also reveal important information regarding risk level.
Given that the presentation of suicidality and cognitive characteristics of ASC may impede effective suicide risk assessment using traditional tools, it is crucial to identify what suicide risk assessments have been utilised in this group, and if none are available, to identify the most robust candidate tools in the general population to adapt. There is a growing body of systematic reviews showing a paucity of research exploring the measurement properties of outcome measures in ASC, which have made important recommendations to improve research and clinical assessment (Cassidy et al. 2018;Hanratty et al. 2015;Wigham and McConachie, 2014;McConachie et al. 2015). These reviews have used a validated research tool developed to assess the methodological quality of studies assessing the measurement properties of health outcome assessment tools: the consensus based standards for the selection of health measurement instruments (COSMIN) (Mokkink et al. 2016;Mokkink et al. 2012;Mokkink et al. 2010). The COSMIN method involves two stages. First, tools used to assess a health outcome in a well-defined population are identified from a systematic search of the literature. Subsequently, the tools used frequently (at least twice), with evidence of validity (i.e. with reference to a previously published study), are searched for using a comprehensive search tool validated for this purpose (Terwee et al. 2009). The quality of the available evidence is subsequently rated using the COSMIN checklist (Mokkink et al. 2016).
It is important to note that tools are not either valid or invalid, but are rather valid for certain purposes or circumstances (Kamphaus & Frick, 2005). The COSMIN checklist allows a systematic assessment of the quality of evidence for and against a range of measurement properties, pooled across studies, thus providing a picture of the strengths and weaknesses of the most frequently used tools in different contexts. This allows us to make evidence based recommendations on which tools to select for particular clinical and/or research contexts. We therefore utilise this robust method to identify suicidality assessment tools used in autistic and general population adults, with similar age and intellectual ability, in order to draw conclusions about the relative quality of the evidence in each group regarding the measurement properties of these tools. Given that autistic adults have difficulty accessing psychiatric services due to lack of expertise and service provision for mental health in autism (Crane et al. 2018;Raja, 2014), suicidality assessment tools used in screening the general population in research and clinical practice will be particularly useful to adapt for autistic adults. The current study thus focused on identifying suicidality screening tools used in general population screening studies, as opposed to tools primarily used in psychiatric groups. From this synthesis of the available evidence, we subsequently make recommendations for future research and clinical practice aiming to effectively assess suicidality in autistic and non-autistic adults. Given the higher risk of death by suicide in autistic adults, without ID (Hirvikoski et al. 2016), we focused the search on adults without ID.

Review Methods: Stage 1
The protocol for this review is registered within the International Register of Systematic Reviews (Registration number: CRD42016035217), and can be accessed online (http://www.crd.york.ac.uk/PROSPERO/prospero.asp). This systematic review follows the guidelines for Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) standards (Moher et al. 2015).

Search Strategy
The following electronic bibliographic databases were searched: Medline, Psychinfo and Web of Knowledge. The Cochrane library was also searched to confirm that no other systematic reviews of the current study topic existed. There were two searches carried out in stage one for suicidality measures used in; a) autistic adults, without co-occurring ID; and b) general population adults, without any co-occurring conditions or ID. The terms for each search strategy are included in table 1. The searches were restricted to peer reviewed articles published in the English language, between 1992 and 22 nd January 2018 -when the last searches were run. The current study focused on literature pertaining to ASC without cooccurring ID, which is frequently referred to as Asperger Syndrome (AS). AS was first included as a separate diagnosis in the WHO International Classification of Diseases in 1992, so we focused on studies published after this date, when we expected reference to AS to be more consistent in the literature.

Selection Criteria
We utilised a standardised approach to the selection of studies as in previous COSMIN reviews (e.g. Cassidy et al. 2018). We focused on tools that include more specific (i.e. specifically suicidality as opposed to self-harm or non-suicidal self-injury), and broader (including in depth assessment of suicidality to help gauge risk level) conceptualisations of suicidality than is feasible in single items or subscales. These typically fail to distinguish broader conceptualisations of self-harm from suicidal intent, and lack information on important risk indicators, such as current and lifetime experience, frequency, intensity, intent and access to means. Therefore studies had to focus on a tool specifically assessing suicidality, including assessment of suicidal intent (as opposed to self-harm more generally), clinically defined as in the International Classification of Diseases (ICD-10), and Diagnostic and Statistical Manual of Mental Disorders (DSM-5). Studies which utilised tools with a single suicide related question, item or subscale contained within a larger measure (e.g. Mini-International Neuropsychiatric Interview (MINI) (Hergueta et al. 1998), Structured Clinical Measurement properties of suicidality tools 8 Interview for DSM-IV (SCID-II) (First et al. 1997)), and/or without evidence of validity (i.e. by reference to a previously published study), were excluded. This is necessary to maximise the probability of identified tools having evidence regarding their measurement properties in search two.
We searched for studies utilising tools to assess both prevalence of suicidality (epidemiological/population studies), and assess outcomes (treatment/intervention and longitudinal/cohort studies). To be included studies had to focus on adults aged 18 years and over, without ID. Where the age range was partly outside this, studies were included if 50% or more of the total population studied was over 18 years, and the mean age of the sample was 18 years or above. This ensured that the tools were likely to be appropriate for adults.
We excluded articles using tools which had been adapted specifically for another population than ASC or the general population (e.g. for older adults, a particular gender, or a specific culture). This was to ensure that the tool would likely be useful for assessing suicidality in general population adults, as opposed to a specific sub-group of the general population. We included studies using the most up to date version of the tool available, as this is most likely to be used in future research and clinical practice.

General population adult search criteria:
Studies were included if data from general population adults, without ID or co-occurring conditions, were presented separately, and comprised at least 50% of the total sample. Any studies including an autistic comparison group were excluded and considered for inclusion in the ASC search.
Autistic adults search criteria: Studies were included if data from autistic adults were presented separately, and if 50% or more of the participants had a diagnosis of ASC.
One reviewer (SC) screened the titles and abstracts of articles for inclusion, and where there was any doubt on whether an article should be carried over to the full text sift, it was included. SC then conducted the full text sift of articles, with any ambiguous papers discussed with LB, EB and JR to reach consensus. All references of included articles were also searched for additional articles to include.

Data Extraction
Data extraction was performed by SC, and 20% of articles independently checked by LB. A data extraction form was adapted from a previously developed form used in similar research (Cassidy et al. 2018;Wigham and McConachie, 2014). Data pertaining to: participant characteristics, tools used, domains captured and study type, were recorded.

ASC
The search for studies using tools to assess suicidality in autistic adults, identified 672 articles which were screened, none of which were retained for analysis ( Figure 1). A majority of the studies initially screened and excluded in the ASC search had explored self-injury and challenging behaviour in autistic adults, often with co-occurring ID, as opposed to suicidality -i.e. including intent to end one's own life. Crucially although a limited number of studies had explored suicidality in autistic adults, none had used a validated tool designed to assess suicidality specifically. A majority of studies in both groups searches had utilised a single item designed for the specific study with no evidence of validity, or a single item or subscale contained within a larger mental health (MINI, SCID) or depression (e.g. PHQ-9, BDI) measure. As stated above, the current study focused specific and broader conceptualisations of suicidality than is possible in single items or subscales. Additionally, it is vital that there is evidence of validity of tools (e.g. by reference to a previous study) in the first stage, in order to identify tools which are likely to meet COSMIN inclusion criteria in the second stage.
Hence, no studies of suicidality in ASC were identified which have used a suicidality assessment tool with evidence of validity to consider further in stage two.

General Population
The search for studies using tools to assess suicidality in general population adults identified 1,774 articles which were screened, with 25 retained for analysis ( Figure 1). Fourteen different tools were used to assess suicidality in the studies (Appendix A). Self-report Ideation (BSS) (Beck, Steer, and Ranieri, 1988), Beck Suicide Intent Inventory (BSI) (Beck, Schuyler and Herman, 1974), Depression Severity Index -Suicide Subscale (DSI-SS) (Metalsky and Joiner, 1997), Modified Scale for Suicidal Ideation (MSSI) (Miller et al. 1986), Paykel (Paykel et al. 1974 were not considered further, as we were interested in tools which had been used frequently (at least twice) in the general population with some evidence of validity, to maximise the chances of there being evidence available to evaluate using the COSMIN checklist. Hence, four tools (C-SSRS; SBQ-R; BSS; and Paykel) were considered further in stage 2.

Review Methods: Stage 2
The second stage of the review searched for evidence of the measurement properties of the tools identified in stage 1. In order to do this, a comprehensive search was carried out using a methodological filter in PubMed, designed to search for studies assessing the measurement properties of health outcome assessment tools (Terwee et al. 2009). We focused on studies which had explored the measurement properties of the tools in adults (18 years and over), without co-occurring ID. Unlike in stage 1, Adult samples with cooccurring conditions were included, as studies exploring the validity of suicidality assessment tools in the general population may nevertheless be validated in psychiatric samples.
Including studies of clinical samples thus provides useful information regarding the contexts the tools may be most useful in research and/or clinical practice.

Data extraction method
Once articles were identified from the search, the methodological quality of each article was assessed using the COSMIN checklist (Consensus based Standards for the selection of health The checklists were completed by SC, with 9 (34.6%) of the articles independently rated by SW, both of whom were trained and experienced in using COSMIN. Inter-rater reliability between SC and SW was 73%, similar to previous studies (e.g. Cassidy et al. 2018; Wigham and McConachie, 2014). Disagreements were resolved with discussion and these agreed COSMIN ratings were utilised in the subsequent evidence synthesis.

Evidence Synthesis
The quality of the evidence in support of each measurement property needs to be considered in the context of the studies' findings, in order to gauge the amount of evidence available for or against each measurement property. First, the quantitative findings from each study are given a rating of positive (in support of the property), indeterminate (not possible to deduce whether the evidence is for or against the property), or negative (evidence against the property). For example, criterion validity is considered positive when the study supplies convincing evidence that the criterion used is indeed a gold standard, and the correlation between the outcome measure and the gold standard criterion is greater No articles assessing the measurement properties of the Paykel were identified from the search. The methodological quality of the included studies are presented in Table 2 and the collated evidence pertaining to the measurement properties for each tool are presented in Table 3. Many of the articles reported data on differences in scores and normative data, which are important for interpretability (De Vet et al. 2011). However, no studies reported minimal important change or floor or ceiling effects.

Suicide Behaviour Questionnaire -Revised (SBQ-R)
Despite evidence of being widely used in general population studies of suicidality, only two studies were found assessing the measurement properties of the SBQ-R in adults. The quality of the evidence in support of hypothesis testing was weak, with one fair study showing significant differences between psychiatric and non-clinical populations in line with hypotheses with large effect (Osman et al. 2001). The quality of the evidence in support of criterion validity was moderate: sensitivity (>.882) and specificity (>.875) were acceptable for successfully differentiating suicidal from non-suicidal individuals, using both the first item of the SBQ-R (Aloba et al. 2017) and total scores (Osman et al. 2001;Aloba et al. 2017 The evidence in support of internal consistency for the BSS was strong, with one

Columbia Suicide Severity Rating Scale (C-SSRS)
Most studies of the C-SSRS explored the measurement properties of the clinician interview version (7/11 studies). There was mixed evidence for internal consistency: two studies showed high Cronbach's alpha for the whole measure (Madan et al. 2016, Posner et al. 2011), but not on some subscales (Madan et al. 2016), and another study showed a poor alpha (Al-Halabi et al. 2016). There was mixed evidence for reliability: one excellent study showed a large range of inter-rater reliability (r = .5 -.9) (Youngstrom et al. 2016), but two poor studies with small samples showed high agreement between raters (.9+) (Hesdorffer et al. 2013;Mundt et al. 2010). The evidence in support of structural validity was strong (Madan et al. 2016), as was the evidence in support of hypothesis testing and criterion validity (Madan et al. 2016;Al-Halabi et al. 2016;Horwitz, Czyz and King, 2015;Hesdorffer et al. 2013;Mundt et al. 2010;Posner et al. 2011). It is also important to note that one of these studies rated as 'good', showed that the C-SSRS had acceptable specificity and sensitivity (>.7) for predicting future adverse events 6 months after discharge (Madan et al. 2016 Four studies explored the measurement properties of the C-SSRS self-report version. There was weak evidence against hypothesis testing, with one fair study showing a poor correlation with the S-STS (Sheehan et al. 2014). There was mixed evidence for criterion validity: one good study showed high specificity and sensitivity with clinical assessment (Yiguera et al. 2015); one fair study showed evidence against the measure with poor agreement with the S-STS (25), and another good study showed evidence against the measure with poor prediction of future adverse events (Chang and Tan, 2015).

Self-Injurious Thoughts and Behaviours Interview (SITBI)
One study had explored measurement properties of the translated Spanish version of the SITBI in 150 inpatients (Garcia-Nieto et al. 2013). Evidence for reliability was mixed. Evidence for inter-rater reliability was poor given the small subsample in which this was assessed (n=15), but in support of the measure with near perfect agreement between raters (k = .09 -1). Evidence for test retest reliability was fair, but against the measure with poor reliability for suicidal gestures and self-harm. Evidence for hypothesis testing was fair, but against the measure with poor agreement with certain measures of similar constructs. Evidence for cross-cultural validity was poor, with only a forward translation carried out.

Discussion
Although Studies of suicidality in ASC were found to utilise a question generated for use in the specific study, without evidence of validity, or used a single question or brief subscale from a broader mental health measure (e.g. PHQ-9, BDI, MINI, SCID). This may reflect the fact that currently many studies of suicidality in ASC have utilised convenience samples from clinical settings, wider studies and existing databases. This lack of standardised and in depth assessment is problematic. For example, single questions from depression measures such as the PHQ-9 do not distinguish self-harm from suicidal intent, and therefore do not assess suicidality per se. The range of measures, many of which lack evidence of validity, could also explain, at least in part, the wide range of suicidality estimates cited in recent reviews of suicidal ideation (11-66%) and attempts (1-35%) in ASC (Hedley and Uljarevic, 2018). A clear recommendation for future suicidality in ASC research is to start using suicidality assessment Importantly, although the evidence for hypothesis testing and criterion validity was mixed for the BSS, this clearly depended on the context in which this tool was used. Specifically, the BSS had strong evidence in support of distinguishing sub-groups (e.g. those who have and have not attempted suicide), but strong evidence against predicting future adverse events (e.g. hospital admissions for suicide attempt). The BSS also had strong evidence for internal consistency, structural validity, and moderate evidence for cross-cultural validity.
Hence, the strengths of the BSS lie in distinguishing sub groups in research, but not when predicting future adverse events in clinical practice.
Two versions of the C-SSRS were assessed; the self-report and clinician interview versions. The self-report version has been more recently developed, and therefore fewer studies (4) were available assessing its measurement properties than the clinician interview version (7). For the self-report C-SSRS, there was weak evidence against hypothesis testing, and mixed evidence for criterion validity. Specifically, there was moderate evidence in support of agreement between the C-SSRS self-report and clinician assessment (Yiguerta et al. 2015), but moderate evidence against the C-SSRS self-report predicting future adverse events (Chang and Tan, 2015). Currently, there is not yet enough evidence to recommend this tool for use in research or clinical practice.
However, the clinician interview version of the C-SSRS had evidence in support of a number of measurement properties. The strengths of the measure lie in structural validity, hypothesis testing, criterion validity and responsiveness to change, and weak evidence in support of cross-cultural validity. Importantly, there was moderate evidence in support of the C-SSRS predicting future suicidal behaviour within 6 months of discharge (Madan et al. 2016). There was however mixed evidence for internal consistency and reliability. This suggests that the clinician interview version of the C-SSRS is likely to be most useful in clinical contexts, to aid clinicians in helping to gauge potential suicide risk as part of a holistic psychosocial assessment, and changes in response to treatment or within clinical trials.
However, more research is needed to establish evidence in support of inter-rater agreement, and internal consistency, particularly concerning subscales.
There was only one study that had explored the measurement properties of the SITBI in adults without ID (with one additional validation study in an adolescent sample which was not included) (Garcia-Nieto et al. 2013). Hence there was limited evidence in support of its measurement properties. Notably, the study showed evidence against hypothesis testing with low agreement with measures of similar constructs. Future research needs to establish the measurement properties of this tool.
There were only two studies exploring the measurement properties of the SBQ-R, despite being used in a number of general population studies of suicidality. Despite this, there was strong evidence in support of internal consistency and structural validity, moderate evidence in support of criterion validity, and weak evidence in support of hypothesis testing. In particular, the SBQ-R showed evidence for high sensitivity and specificity for distinguishing sub-groups using the first item (Aloba et al. 2017) and total scores (Osman et al. 2001;Aloba et al. 2017). Notably, the SBQ-R is the briefest tool out those identified in this review (with 1-4 items), does not carry a cost to use, and has comparable quality of evidence in support of a range of measurement properties compared to the other scales which are longer and carry a cost (C-SSRS and BSS). Hence, the SBQ-R could be particularly useful for future research.
In summary, the current study revealed strong consistent evidence across three frequently used suicidality assessment tools (BSS, C-SSRS and SBQ-R), for reliably distinguishing sub-groups (e.g. those who have or have not attempted suicide in the past).
However, there were relatively few studies exploring an important component of criterion validity for suicidality assessment tools -prediction of future adverse events (e.g. future suicidal behaviour, future hospitalisations or emergency department visits). Research has suggested that suicidality assessment tools on the whole are poor predictors of future attempts, many perform worse than patient or clinical assessment, and may therefore be a waste of valuable resources (Quinlivan et al. 2017;Quinlivan et al. 2016). The current study adds useful evidence to this debate, as it is the first to use a validated research tool (COSMIN), to synthesise the quality of the evidence for a range of measurement properties, across a number of studies. On the basis of our synthesis of the available evidence, results suggest that certain tools (i.e. C-SSRS interview) may have greater utility in predicting future adverse events than others (e.g. BSS). Results also suggest that designs which assess criterion validity on the basis of distinguishing sub-groups may over-estimate diagnostic accuracy of a tool. This is consistent with previous research (Lijmer et al. 1999), which recommends the use of cohort studies in assessing the usefulness of suicidality assessment tools.

Future Research
No studies have yet utilised any of the suicidality assessment tools that have been developed for and widely used in the general population, in autistic adults. As discussed above, the characteristics of ASC, and differing presentation of suicidality in this group, could all affect the utility of these tools. A first step would be to explore the content validity of these existing tools through focus groups and cognitive interviews, to inform adaptations, prior to exploring other measurement properties of these tools. COSMIN criteria stipulates that excellent studies should compare the performance of adapted to original measures (Mokkink et al. 2016). We also recommend comparing the performance of measures between ASC and general population groups, to ascertain whether measurement properties A crucial aspect of exploring validity of suicidality assessment tools, are whether these are useful to clinicians in gauging risk of future suicide attempts. However, few studies have explored this crucial aspect of criterion validity. Hence, it is critical that future studies assessing criterion validity of suicidality assessment tools in autistic and general populations not only rely on distinguishing sub-groups, which over-estimate diagnostic accuracy of a tool. Rather, cohort studies are needed to assess whether current and adapted suicidality assessment tools can predict future suicidal behaviour significantly more accurately than clinician opinion or patient self-report.

Strengths and Limitations
A key strength was using a rigorous method (COSMIN) to systematically identify and evaluate relevant studies. However, following this strict method meant that some tools were excluded from the analysis, such as single items or subscales from broader mental health measures. As suicidality in ASC is such a new area of research, it could be argued that adopting such rigorous methods might have led us to overlook other relevant data which could indicate the usefulness of one tool over another. However, we were interested in more specific and broader conceptualisations of suicidality than is feasible in single questions or subscales. We also focused on tools which had been used frequently in general population adults, without ID, or co-occurring conditions, rather than including measures only used in psychiatric groups, as these tools were more likely to be useful for a range of non-clinical and clinical groups, and in a range of clinical and research contexts. Our search was also limited by focusing only on studies in English, due to lack of translation resources, and data extraction was also conducted only in part by two independent reviewers. each article. However, there was good agreement between raters in the current study (73%), similar to previous COSMIN reviews (e.g. Cassidy et al. 2018;Wigham and McConachie, 2014).

Conclusion
This is the first systematic review to use a robust research tool (  1. (general population or population sample or community sample or national* survey or household* survey or non referred or non clinical or population screen*) 2. (ASC or ASD or Asperg* or Autis* or high functioning or pervasive developmental disorder* or PDD or HFA) 3. (adult*) 4. (assess* or tool or treatment outcome or measur* or scale or quotient or inventory or instrument) 5. (suicid* or self harm or self inj* or parasuicide or suicide attempts or attempted suicide) 6. randomised controlled trial or randomized controlled trial 7. random* 8. comparative stud* 9. prospective stud* 10. intervention 11. treatment effectiveness evaluation or treatment response or treatment study 12. epidemiolog* 13. prevalence 14. General Population Search (6 or 7 or 8 or 9 or 10 or 11 or 12 or 13) and (1 and 3 and 4 and 5) 15. Autism Spectrum Condition Search (6 or 7 or 8 or 9 or 10 or 11 or 12 or 13) and (2 and 3 and 4 and 5) 16. limit 14 and 15 to English Language; 1992 -current; age 18 years +  Mundt et al. (2013). good