Investigating the validity of the Strengths and Difficulties Questionnaire to assess ADHD in young adulthood

Highlights • The SDQ is widely used to assess ADHD symptoms in children.• We investigated the validity of the SDQ to assess ADHD at age 25 years.• The SDQ ADHD subscale had high validity in distinguishing DSM-5 ADHD cases from non-cases.• A lower cut-point is needed to identify ADHD diagnosis in young adulthood compared to younger ages.• The SDQ is appropriate for ADHD research across different development periods.


Introduction
Attention Deficit Hyperactivity Disorder (ADHD) typically is a childhood-onset neurodevelopmental disorder characterized by symptoms of inattention and hyperactivity/impulsivity. Symptoms tend to decline across childhood and adolescence, but around 65% of individuals with a childhood diagnosis are estimated to show some level of symptom persistence into adulthood  and estimates of the prevalence rate of adult ADHD are approximately 2.5% to 4.4% (Kessler et al., 2006;Simon et al., 2009). ADHD symptoms are associated with distress and impairment, including in educational, occupational, social and other settings, across childhood, adolescence and in adulthood (Asherson et al., 2012;Thapar and Cooper, 2016).
Given the life-course impacts of ADHD, there is a need to be able to assess symptoms across development to monitor and investigate correlates of symptom persistence and desistence. In particular, there is growing interest in the transition from childhood and adolescence to young adulthood (Ford, 2020) -a period that is associated both with changing demands and the transition from child/adolescent mental health services (CAMHS) to adult mental health services (AMHS). Robust investigation of symptom continuity and discontinuity requires repeated assessments using the same measure (Goodman et al., 2007); however, there is a lack of research into whether measures commonly used to assess ADHD symptoms in childhood and adolescence are also valid assessment tools in adulthood.
Adult ADHD has generally been assumed to be a continuation of childhood ADHD and this is supported by findings that adult ADHD symptoms and diagnosis show similar characteristics (e.g. high heritability, associations with other neurodevelopmental problems) when accounting for change in measure and informant (Larsson et al., 2014;Riglin et al., 2020;Rovira et al., 2020). However, the Diagnostic and Statistical Manual of Mental Disorders (5th edition, DSM-5) criteria acknowledge some developmental differences by requiring fewer symptoms for a diagnosis of ADHD in adulthood compared to earlier in life (American Psychiatric Association, 2013). Research also suggests some change in the presentation of ADHD symptoms with age, whereby inattentive symptoms are more likely to persist whereas hyperactive-impulsive symptoms seem to become less common with age (Davidson, 2008;Willcutt et al., 2012). The validity of using child/adolescent based measures in adulthood cannot therefore be assumed.
The Strengths and Difficulties Questionnaire (SDQ) hyperactivity/ ADHD subscale (Goodman, 1997) is a brief screening tool and has been used widely to assess ADHD symptoms in children and adolescents in both research and clinical settings across different countries. Self, parent and teacher versions of the questionnaire have been validated for children and adolescents, with self-reports suitable for children aged 11 years or older (Goodman, 2001;Goodman et al., 2000;He et al., 2013). More recently an adult version of the SDQ has been developed (using almost identical wording for the ADHD subscale), which has been found to show similar psychometric properties to child/adolescent samples (Brann et al., 2018). However, to our knowledge the ADHD subscale of the SDQ has yet to be validated against ADHD diagnosis in adulthood.
The aim of this study was to examine the criterion validity of the SDQ as an assessment of ADHD symptoms in young adulthood, by examining its ability to discriminate DSM-5 ADHD cases from non-cases at age 25 years in a UK population cohort. We also explored whether a different cut-point to that recommended in childhood and adolescence is required to optimize sensitivity and specificity with respect to an adult diagnosis. Follow-up analyses included 1) separate analyses for males and females, and 2) assessing generalizability across subscales by comparing results for the ADHD subscale to the other SDQ (emotional, conduct, peer, prosocial) subscales.

Sample
We analyzed data from the Avon Longitudinal Study of Parents and Children (ALSPAC), a well-established prospective, longitudinal birth cohort study Fraser et al., 2013;Northstone et al., 2019) (total possible N = 14,901). Details of this study are provided in the Supplementary Material. Ethical approval for the study was obtained from the ALSPAC Ethics and Law Committee and the Local Research Ethics Committees. Informed consent for the use of data collected was obtained from participants following the recommendations of the ALSPAC Ethics and Law Committee at the time. Individuals were included in our analyses when ADHD diagnosis data were available at age 25 years (28% of the total possible sample) (see below): comparisons between those with and without diagnosis data are given in Supplementary Table 1. Primary analyses were conducted using self-reports as would typically be used in adult mental health services (N = 4121), with secondary analyses conducted using parent-reports (N = 4330) (see below).

Diagnoses
ADHD diagnosis was assessed at age 25 years using the self-rated Barkley Adult ADHD Rating Scale (BAARS-IV) (Barkley, 2011). The BAARS-IV is an assessment tool for adult ADHD which includes the 18 DSM-5 ADHD items and ten domains of impairment (all items scores 0-3). Diagnoses were generated from the BAARS-IV based on DSM-5 criteria (American Psychiatric Association, 2013), using diagnostic coding constructed by SSA, KL and LR with advice from clinical psychiatrists OE and RBJ (blinded to the SDQ scores). In-line with recommendations, ADHD symptoms were defined to be clinically significant if endorsed as occurring 'often' or 'very often' (Barkley, 2011). In line with DSM-5, diagnostic coding defined ADHD diagnosis as the presence of five or more symptoms of inattention or of hyperactivity/impulsivity, symptom onset prior to age 12 years (assessed retrospectively also using the BAARS-IV), and current symptoms to be accompanied by impairment in social, academic, or occupational functioning (see Supplementary Table 2) (American Psychiatric Association, 2013).

Strengths and Difficulties Questionnaire (SDQ)
Participants also completed the self-rated age 18+ version of the SDQ which includes the five item hyperactivity/ADHD subscale (Goodman, 1997) (range 0-10) at age 25 years. The ADHD subscale is well-validated in child and adolescent samples, for which the recommended cut-point for "high" symptoms (top 10% of the general population) is ≥7, with scores of 0-5 categorized as close to average (Green et al., 2005). In line with recommendations (www.sdqinfo.org) total ADHD subscale scores were derived using mean imputation for those with (≤2) of SDQ items missing. Follow-up analyses investigated the other SDQ subscales of emotional problems, conduct problems, peer problems and prosocial behavior (each subscale has five items, range 0-10). The SDQ can be downloaded from https://www.sdqinfo.org/.

Analyses
Data were analyzed using Stata version 14. Receiver Operating Characteristic (ROC) curve analyses were used to examine the ability of the SDQ to distinguish between cases and non-cases of ADHD classified using the BAARS-IV. A ROC curve was plotted (sensitivity vs 1-specificity) and the area under the curve (AUC) estimated. Stata's cvauroc function (Luque-Fernandez et al., 2019) was used to minimize overfitting by implementing a 10-fold cross-validation and bootstrapping cross-validated AUC values to obtain bias-adjusted confidence intervals. The AUC assesses the diagnostic efficiency of the SDQ subscale in relation to meeting ADHD diagnostic criteria, whereby 0.5 indicates performance at chance level and 1.0 indicates perfect detection. Values < 0.7 are generally considered low and those >0.9 excellent (Swets, 1988).
Sensitivity reflects the ability to correctly identify all those with a diagnosis (true positives) whereas specificity reflects the ability to correctly identify all those without a diagnosis (true negatives). Optimum cut-points for ADHD diagnosis were selected as those that best balanced sensitivity and specificity, i.e. assuming that false positives and false negatives are equally undesirable, according to maximal Youden Index (sensitivity + specificity -1) (Youden, 1950). Given that previous work recommending cut-points in child and adolescent samples were based on identifying the top 10% of the population (Goodman, 1997;Green et al., 2005), we also explored which cut-points would capture 10% of our sample. Positive predictive values (PPV) and negative predictive values (NPV) were calculated for proposed cut-points, which reflect the probability that those exceeding the cut-point have a diagnosis and the probability that those not meeting the cut-point do not have a diagnosis, respectively.
Follow-up analyses were conducted (i) separately for males and females, using Stata's roccomp function (Cleves, 2002) to compare AUC values by sex, and (ii) using the other SDQ subscales to distinguish between cases and non-cases of ADHD. Secondary analyses were conducted using parent-reports instead of self-reports. As parents have been observed to be important informants for ADHD symptoms even in early adulthood (Barkley et al., 2002), we tested the validity of the parent-rated SDQ against parent-reported diagnosis (see Supplementary Material).

DSM-5 diagnosis of ADHD
Of those with available data, 2.9% met DSM-5 diagnostic criteria for ADHD at age 25 years (see Supplementary Table 2). At age 25 years, male sex was not associated with an increased likelihood of meeting ADHD criteria (3.0% in females and 2.9% in males: OR=0.97, 95% CI=0.66-1.43, p = 0.89).

Receiver Operating Characteristic (ROC) curve analyses
ROC curve analyses suggested the SDQ ADHD subscale has high validity in distinguishing ADHD cases from non-cases (AUC=0.90, 95% CI=0.87-0.93), as shown in Fig. 1. Sensitivity and specificity values for all possible SDQ ADHD subscale cut-points are shown in Table 1.

Optimum cut-points
The optimum cut-point for distinguishing ADHD cases from noncases was ≥5, which showed high validity in correctly identifying those meeting diagnostic criteria (i.e. true positives: sensitivity=89.3%) and good validity in correctly identifying those who did not meet diagnostic criteria (i.e. true negatives: specificity=76.7%). This cutpoint captured 25.3% of the sample and was associated with a very high probability that those not meeting the cut-point did not meet diagnostic criteria (NPV=99.6%) but a lower probability that those exceeding the cut-point met diagnostic criteria (PPV=10.4%).
In line with previous work in younger samples (Goodman, 1997;Green et al., 2005) we also inspected the cut-points which identified the top 10% of the sample: this was ≥6 (sensitivity=76.0% and specific-ity=88.7%), which again was associated with a very high but slightly attenuated probability that those not meeting the cut-point did not meet diagnostic criteria (NPV=99.1%) but a slightly higher probability those exceeding the cut-point met diagnostic criteria (PPV=16.9%). The 2×2 tables for SDQ cut-point by whether or not individuals met diagnostic criteria along with sensitivity, specificity, PPV and NPV for both SDQ cut-points (≥5 and ≥6) are shown in Supplementary Table 3.

Other SDQ subscales
Analyses examining the other SDQ subscales found lower validity in distinguishing ADHD cases from non-cases; as shown in Table 3, AUC values ranged from 0.58 to 0.75 for the prosocial, conduct problems, peer problems and emotional problems subscales.

Secondary analyses using parent-reports
Analyses using parent-reported data found that the parent-reported SDQ was also able to accurately identify ADHD diagnosis (see Supplementary Material): AUC=0.97 (95% CI=0.92-0.98) but with a lower optimum cut-point of ≥4, capturing 13.2% of the sample.

Discussion
This study aimed to examine the validity of the SDQ as an assessment of ADHD symptoms at age 25 years in a UK population cohort. We found the hyperactivity/ADHD subscale to have high validity in distinguishing those meeting ADHD diagnostic criteria from those who did not. This suggests that the SDQ subscale, which is widely used in child and adolescent populations, is also suitable for use in young adults.
Our finding of excellent validity of the self-rated SDQ for measuring ADHD in young adulthood (AUC=0.90) is similar if not somewhat higher than previous work in children and adolescents (using self-and parent-reports) (Algorta et al., 2016;Goodman et al., 2003), including previous work in the present sample at age 17 years (AUC=0.89) using parent-reports (Caye et al., 2019). In general, despite the trend that hyperactive-impulsive symptoms are likely to decline whilst inattentive symptoms persist with age (Davidson, 2008;Willcutt et al., 2012), the SDQ subscale -which includes three hyperactive-impulsive and two inattentive symptoms -shows high validity across childhood, adolescence and into young adulthood. This suggests that while the presentation of symptoms may change with age, the SDQ items capture a similar construct. Indeed, previous work in a smaller high-risk sample  found the SDQ to have similar psychometric properties (e.g. inter-scale correlations, internal consistency, and inter-rater agreement) in adults compared with adolescents (Brann et al., 2018). We also investigated whether different cut-points are required to capture clinically relevant symptoms in young adulthood compared to younger ages. Current recommendations based on identifying the top 10% of the population suggest that in childhood and adolescence, selfrated scores of ≥7 capture those with high symptoms, with scores of 0-5 capturing those close to average (Goodman, 1997;Green et al., 2005). At age 25 years, our results suggest that lower cut-points are needed to capture clinically relevant symptoms, with scores of ≥5 achieving the best balance between sensitivity and specificity in identifying those meeting ADHD diagnostic criteria and a cut-point of ≥6 capturing 10% of our sample. These lower thresholds likely reflect the generally lower levels of ADHD symptoms in adulthood and is consistent with diagnostic criteria that require fewer symptoms to be present for an adult diagnosis to be made (American Psychiatric Association, 2013). While these cut-points performed well in identifying those meeting diagnostic criteria, this short screening tool captured a broader group of individuals, with the cut-point of ≥5 capturing 25% of the sample. For those aiming to identify a smaller proportion of their research samples, it may be preferential to select the top 10% (requiring a cut-point of ≥6 in this sample). Replication of these cut-points in different samples is an important area for future research.
The selection of appropriate cut-points depends on the aim and rationale for using the instrument. We selected cut-points based on the premise that false positives and false negatives were equally undesirable (Youden, 1950), which is most likely appropriate for a one-stage approach in research settings to define a group most likely to have a diagnosis. However, for use in high-risk samples or as a clinical screening measure, possible cases would be followed up with further clinical assessment in a two-stage approach. In these circumstances, an increased number of false positives may be considered acceptable (Goodman, 1997) and lower cut-points that favor sensitivity may be preferable. In contrast, higher cut-points that favor specificity would be more appropriate for measures that are used for diagnostic purposes (Silva et al., 2015). The selection of cut points can also depend on sample characteristics such as age and sex. Our study suggests the need for a lower cut-point in young adulthood to capture ADHD in young adulthood than in childhood/adolescence, but we found little evidence of sex differences in the ability of the SDQ to assess young adult ADHD, in line with previous work in children and adolescents (Algorta et al., 2016). Finally, we found the ADHD subscale of the SDQ to show higher validity in distinguishing between cases and non-cases of ADHD compared to the other SDQ subscales (emotional, conduct, peer, prosocial), which support the specificity of this subscale.
While our primary analyses focused on self-reports, we conducted secondary analyses using parent-reports (of both the SDQ and to define diagnosis). While adult mental health services may not typically involve parents, research studies often utilize parent ratings at earlier ages, making the validity of the parent-rated SDQ to assess ADHD in young adulthood important for researchers interested in assessing continuity and discontinuity, which ideally requires repeated assessments using the same measure and informant (Goodman et al., 2007). We found parent-ratings using the young adult SDQ to also show high validity and consistent with our primary analyses using self-reportsresults suggested that lower cut-points are needed to capture clinically relevant symptoms (≥4 in young adulthood compared to ≥8 in childhood/adolescence.) The validity of parent ratings of their young adult offspring's ADHD symptoms is consistent with recent work in this sample which found parent SDQ ratings of ADHD show similar genetic and neurodevelopmental correlates at age 25 as in childhood, suggesting that these reports capture a similar construct across development (Riglin et al., 2020) and implying that parents may still provide valuable insight into their offspring's symptoms at 25 years.
Our study has a number of strengths, including the use of a large population sample and the investigation of a measure widely used in childhood and adolescence, which will enable continuity in future investigation of ADHD across different developmental periods. However, findings should also be considered in light of limitations, including non-random attrition which means those included in this sample will not be fully representative of the broader population, as those with elevated risk of psychopathology are more likely to have dropped out of this birth-cohort by the time data were collected at age 25 years (Martin et al., 2016;Taylor et al., 2018). This is particularly relevant to sample specific cut-points (i.e. capturing the top 10%) which will likely be lower than would be found in the general population. We were also reliant on questionnaire rather than direct interview data to generate ADHD diagnoses, although this is a widely used tool which covers all diagnostic criteria, including impairment and age-at-onset (Barkley, 2011). Finally, it is not clear the extent to which findings from this community sample are applicable to high-risk or clinical samples.
In conclusion, we found the widely-used and freely-available SDQ, used in children and adolescents, to be a valid measure for assessing ADHD symptoms in young adults (at age 25 years). We identified a different, lower cut-point to help more accurately identify those who may have an ADHD diagnosis in this age group. Our findings suggest that the SDQ is suitable for ADHD research across different developmental periods, which will aid the robust investigation of ADHD from Table 2 Sensitivity and specificity for SDQ hyperactivity/ADHD cut-points compared against ADHD diagnosis for males and females separately.  childhood to young adulthood.

Author contributions
LR, SSA, OE, RBJ and AT conceptualised the study, LR, SSA, OE, RBJ and KL derived the variables, LR wrote the original draft, SSA performed statistical analyses, ES and AT attained the study funding. All authors contributed to the interpretation of data and reviewed/edited the manuscript.

Declaration of Competing Interest
None.

Acknowledgement
We are extremely grateful to all the families who took part in this study, the midwives for their help in recruiting them, and the whole ALSPAC team, which includes interviewers, computer and laboratory technicians, clerical workers, research scientists, volunteers, managers, receptionists and nurses. The UK Medical Research Council and Wellcome (Grant ref: 217065/Z/19/Z) and the University of Bristol provide core support for ALSPAC. This publication is the work of the authors and Lucy Riglin, Sharifah Shameem Agha and Anita Thapar will serve as guarantors for the contents of this paper. A comprehensive list of grants funding is available on the ALSPAC website (www.bristol.ac.uk/alspa c/external/documents/grant-acknowledgements.pdf). The measures used in the paper were specifically funded by the Wellcome Trust (204895/Z/16/Z). REW and ES work in a unit that receives funding from the University of Bristol and the UK Medical Research Council (MC_UU_00011/1 and MC_UU_00011/3). RBJ is supported by the Welsh Government through Health and Care Research Wales (National Institute for Health Research Fellowship, NIHR-PDF-2018). REW is supported by a postdoctoral fellowship from the South-Eastern Regional Health Authority (2020024). This research was funded by the Wellcome Trust (204895/Z/16/Z). For the purpose of Open Access, the author has applied a CC BY public copyright licence to any Author Accepted Manuscript version arising from this submission.