Psychometric properties of the Swedish version of the Reynolds Adolescent Depression Scale second edition (RADS-2) in a clinical sample

Abstract Objective: Observed and predicted increases in the global burden of disease caused by major depressive disorder (MDD) highlight the need for psychometrically robust multi-dimensional measures to use for clinical and research purposes. Reynolds Adolescent Depression Scale second edition (RADS-2) is an internationally well-validated scale measuring different dimensions of adolescent depression. The Swedish version has previously only been evaluated in a normative sample. Methods: We collected data from patients in child and adolescent psychiatry and primary care and performed: (1) Confirmatory factor analysis (CFA) to evaluate the established four-factor structure, (2) Analyses of reliability and measurement invariance, (3) Analyses of convergent and discriminant validity using the Montgomery–Asberg Depression Rating Scale, the depression subscales of the Beck Youth Inventories and the Revised Child Anxiety and Depression Scale, as well as the Patient Reported Outcome Measurements Information System, peer-relationships and physical activity item banks. Results: Recruited participants (n = 536, 129 male and 407 female, mean age 16.45 years, SD = 2.47, range 12 − 22 years) had a variety of psychiatric diagnoses. We found support for the four-factor structure and acceptable to good reliability for the subscale and total scores. Convergent and discriminant validity were good. Measurement invariance was demonstrated for age, sex, and between the present sample and a previously published normative sample. The RADS-2-scores were significantly higher in the present sample than in the normative sample. In this clinical study, the Swedish RADS-2 demonstrated good validity and acceptable to good reliability. Our findings support the use of RADS-2 in Swedish clinical and research contexts.


Introduction
It is predicted that major depressive disorder (MDD) will soon top the list of mental and physical disorders with the largest negative impact on global health [1], and it is already causing substantial disability worldwide [2]. The teenage years is a vulnerable developmental period during which the MDD prevalence is increasing, particularly in females [3]. This early onset of MDD increases the risk for recurrent depressive episodes [4] and increases all-cause mortality as well as suicide rates [5]. The clinical picture of teenage MDD differs from that of adults. For example, the Diagnostic and Statistical Manual of Mental disorders, fifth edition (DSM-5) allows for the replacement of depressed mood with irritability as a core diagnostic symptom criterion [6]. At the same time, both pathophysiological and neurodevelopmental similarities are shared between teenagers and adults up to their mid-twenties [7,8], which has warranted research spanning across this critical age-range [9]. In the evaluation of symptom severity and treatment effectiveness for individual patients, as well as more broadly for research with national and international ambitions, measures of depression that are valid across different contexts, age-groups, and languages are clearly needed.
The Reynolds Adolescent Depression Scale second edition (RADS-2) is one such internationally well-validated ageappropriate measure, and by means of self-report it quantifies four dimensions of depression: dysphoric mood, anhedonia/negative affect, negative self-evaluation, and somatic complaints [10]. RADS-2 is compatible with both the DSM-5 and the International Classification of Disease, eleventh edition , and the scale is widely used both clinically as well as in research [11][12][13][14][15][16][17][18][19]. The four-factor structure of RADS-2 has been supported in several confirmatory factor analyses, see, e.g. [11,13,16], and convergent as well as discriminant validity are good to excellent both in non-clinical [11,13] and in clinical [17] samples. Reliability has also been demonstrated in large samples [10]. We have translated RADS-2 to Swedish and have replicated these findings in a normative sample study previously published in this journal [20]. In that sample measurement, invariance was also confirmed for sex and age-group [20]. Since the scale is intended for clinical application [10] and since valid outcome-measures are needed for both ongoing and future clinical trials, including in participants from a wider age-range than previously studied [9], we here present data on the psychometric properties of RADS-2 in a heterogenous clinical sample with affective symptomatology.
In this study, we aimed to evaluate the RADS-2 factor structure, validity, and reliability as well as measurement invariance to determine whether the scale measures the construct equivalently in males compared to females, and in teenagers 12-17 years old compared to young adults 18-22 years old. We also aimed to compare individual scores from a normative non-clinical sample with scores from this clinical sample to test the hypothesis that the scores would be higher in the clinical sample. To support such comparisons, we also aimed to test measurement invariance for the clinical and non-clinical samples.

Materials and methods
This cross-sectional study was approved by the regional ethical review board at Umeå University in Sweden (D.nr 2018/ 59-31), by PAR-inc., the publisher and copyright holder of RADS-2, as well as by the manager of each participating clinic. Written informed consent was collected from all participants before inclusion. Additional parental consent was collected for participants below 15 years of age. A reimbursement equivalent to 20 Euro was provided to the participants.

Participant recruitment and data collection
Participants were recruited from four child and adolescent psychiatry clinics, one primary care youth-clinic and one primary care health-clinic; all in four Swedish cities/towns with population-range 8000 À 130,000. Flyers were posted in the waiting-rooms of the clinics. Information was also sent out by mail or SMS to patients with affective disorders who were admitted to the clinics and either waitlisted for treatment or in active treatment, for details see the eligibility criteria below. Those who did not respond to these invitations were contacted over the phone once.
Eligibility criteria were: (1) Being between 12 and 22 years of age, (2) Being a patient at any of the recruiting sites, (3) Having self-reported or parent-reported symptoms of depression and/or anxiety (all comorbidities were allowed), (4) For individuals with a recent history of suicide attempt or psychiatric inpatient-care a minimum time of three months had to have passed from the suicidal event or since discharge from hospitalization, and (5) Fluency in written Swedish and ability to complete the questionnaires.
Eligible participants were sent a link by email to an online platform where they signed in from their preferred location and device, provided written informed consent, and responded to the questionnaires. The order of the scales in the questionnaire was altered between participants to prevent a bias effect of fatigue in replying. At the end of questionnaire-data collection additional data on psychiatric diagnoses that had been given one month before and after the self-rating was extracted from the individual medical records of the participants that were recruited from child and adolescent psychiatry. Descriptive statistics of the participant demographics are presented in Table 1. Data collection was performed between 2019 and 2022.
We also re-analyzed data from a previously published normative sample of n ¼ 637 [20], to evaluate differences in RADS-2 scores between the normative and clinical samples. The methods used for participant recruitment and data collection for the normative sample have been described in detail in our previous publication [20].

Reynolds adolescent depression scale second edition (RADS-2)
RADS-2 is a 30-item self-rating scale with brief self-statements like 'I feel like crying' [10]. Response options are ordinal on a four-point scale ranging from 'almost never' to 'most of the time'. The four subscales are 'dysphoric mood', 'anhedonia/negative affect', 'negative self-evaluation', and 'somatic complaints'. Items in the anhedonia/negative affect subscale are reversely phrased, e.g. 'I feel happy' and hence reversely coded. The scale sum raw score ranges from 30 to 120 and higher scores indicate more severe symptomatology [10]. RADS-2 has been translated to Swedish and previously validated in a normative sample [20].

Beck youth inventories
From the Beck Youth Inventories of Emotional and Social Impairment [21] we specifically used the depression subscale (BYI-D). BYI-D consists of 20 brief self-statement questions like, e.g. 'I feel sad', with responses on a four-point ordinal scale ranging from 'never' to 'always'. The sum raw score range is 0-60 and higher scores indicate more severe symptomatology. Internationally high internal consistency as well as test-retest reliability have been demonstrated in large samples [21]. Even though the discriminative ability of the scale has been questioned [22] findings of good convergent validity have been replicated in a clinical adolescent outpatient sample [23]. In Sweden BYI-D is widely used in Child and Adolescent Psychiatry to evaluate depression severity [24]. It is also recommended by the Swedish Agency for Health Technology Assessment and Assessment of Social Services in screening for MDD [25]. In the current sample the BYI-D Cronbach's alpha was 0.95 (95% CI 0.94-0.95).

Montgomery-Asberg depression rating scale (MADRS)
The MADRS is a scale that is widely used to assess depression severity [26][27][28]. We used the self-rating version of the scale, which includes nine items on a seven-point ordinal scale, e.g. reported sadness, with a sum raw score range of 0-54. Higher scores indicate more severe symptomatology [29]. Reliability and validity are good in Swedish adolescent psychiatric outpatients [29]. In this sample, the MADRS Cronbach's alpha was 0.89 (95% CI 0.87-0.90).

Revised Child Anxiety and Depression Scale (RCADS)
From the RCADS we specifically used the depression subscale (RCADS-depression), which consists of 10 items rating the extent to which one is, e.g. 'feeling sad or empty', on a four-point ordinal scale from 'never' to 'always' [30]. The scale has good validity and reliability in clinical populations of children and adolescents in different assessment settings, countries, and languages, see, e.g. [31][32][33][34]. The sum raw score range is 0-30 with higher scores indicating more severe symptomatology [31]. In this sample, the RCADSdepression Cronbach's alpha was 0.90 (95% CI 0.89-0.91).

Patient-reported outcome measurement information system
The National Institutes of Health developed the Patient Reported Outcome Measurement Information System (PROMIS), which contains item banks for various health and lifestyle dimensions [35]. We used the PROMIS Pediatric Bank version 1.0 [36] -Physical activity (PROMIS-physical activity) and PROMIS Pediatric Bank version 2.0 [36] Peer-relationships (PROMIS-peer-relationships) in this study. These item banks consist of 10 and 15 questions, respectively, each framed in past tense starting 'In the last seven days … ', e.g. ' … I was able to count on my friends.' Responses are recorded on a five-point ordinal scale ranging from 'Never' to 'Almost always'. The sum raw score range is 0-40 (PROMIS-physical activity) and 0-60 (PROMIS-peer-relationships), and higher scores indicate more of the measured construct. For more information on item definitions and the concepts behind them, see [36]. The item banks used for this study have been translated and culturally adapted for Swedish adolescents [37] and the former of the two has been psychometrically evaluated in Swedish adolescents with good reliability [38]. In this sample, the Cronbach's alphas were 0.93 (95% CI 0.92-0.94) and 0.90 (95% CI 0.88-0.91) for PROMIS-physical activity and PROMIS-peer-relationships, respectively.

Data analysis
We used standard measures for descriptive statistics. On RADS-2 the missing item-level data range was 2-5 (0.4-0.9%) for each individual item. Thirteen participants (2.4%) had missing values on any RADS-2 item and Little's test was not significant (Chi-square (v 2 ) 264.49, DF 284, p ¼ 0.79). We assumed the missingness mechanism was completely random and removed these individuals from the dataset altogether. In a similar way for validity-analyses listwise deletion of individuals with missing item-level data was applied on BYI-D, MADRS, RCADS-depression, and PROMIS item banks to allow for total-score calculations (the range of missing item-level data was 0-6, i.e. 0-1.1% for individual items). Total sum-scores of items on ordinal scales were conservatively treated as ordinal variables throughout the analysis. To analyze sex-and age-group (12-17 years and 18-22 years) differences in RADS-2 subscale scores, and totalscale scores we used the Mann-Whitney U test, which was also used to compare mean RADS-2 scores in the clinical and normative samples.
McDonald's coefficient Omega [44] was used to test reliability and the following well-established cut-offs were used in the interpretation of internal consistency: ! 0.7 ¼ acceptable, ! 0.8 ¼ good, and ! 0.9 ¼ excellent [45].
Measurement invariance/equivalence was tested in a specific forward procedure for ordered variables following the model identification approach of Wu and Estabrook [46] and as laid out in detailed guidelines by Svetina et al. [47]. This was done to test hypotheses of RADS-2 being understood and measured equivalently in males and females, in the different age-groups (12-17 and 18-22), and in the clinical and non-clinical samples respectively. The following consecutive steps were performed: (1) Configural invariance tests to evaluate the factor structure, with separate individual CFAs on males/females and in both age-groups (12-17 and 18-22). The normative sample CFA has been reported elsewhere [20]. (2) Threshold invariance tests to evaluate the equivalence of thresholds. (3) Metric invariance tests to evaluate the equivalence of thresholds and factor loadings. (4) Strong/scalar invariance tests to evaluate the equivalence of thresholds, factor loadings, and intercepts.
By sequentially performing models with increasingly stringent constraints in this way, and by comparing each model to the previous one; invariance achieved at the scalar level indicates that the scores are not influenced by item-level group differences and that latent means are comparable across groups. In this context, the v 2 test has high power and inflated Type I error rates have been observed [47]. Therefore, to determine whether measurement invariance had been achieved at a specific level the following cut-offs for change in fit index was considered: DCFI ¼ À0.002 and DRMSEA ¼ 0.05 for thresholds and DCFI ¼ À0.002 and DRMSEA ¼ 0.01 for thresholds and factor loadings [47]. We also considered the Satorra-Bentler scaled v 2 difference test statistic; and non-significant p values were interpreted as indicative of model equivalence [48]. This invariance testing procedure is strong and required for latent means to be compared across groups [47].
To test convergent and discriminant validity we performed Spearman's correlations between RADS-2 and established measures of depression (BYI-D, MADRS, and RCADSdepression) as well as between RADS-2 and constructs that are theoretically distinct from depression (the PROMIS-item banks specified above). Correlations between 0.1 and 0.29 were interpreted as small, 0.3-0.49 as medium, and 0.50 and above as large [49].
All analyses were two-tailed, statistical uncertainties are presented in 95% confidence intervals, except for RMSEA where a 90% interval is current standard, and a significance level of 0.05 was used. We analyzed data using SPSS statistics version 28 (IBM Corp., Armonk, NY) and R [50]. The structural equation modeling used for CFA and measurement invariance modeling was performed in R using the Lavaan package version 0.6-3 [51].

Descriptive statistics
The percentage of invited participants who did not respond or declined to participate was 75%. Descriptive statistics of the sample are presented in Table 1. Mean age was 16.45 years (SD ¼ 2.47), 95.10% were born in Sweden, and 80.10% were living with one or both parents. In Table 1, the participants' primary ICD-diagnosis at the time of data collection is reported as well as their households' socioeconomic status according to the classification system used by Statistics Sweden [52,53].
Means and standard deviations for all RADS-2 items, subscale scores and total scores are reported by age-group (12-17 years and 18-22 years) and sex (male/female), as well as for the whole sample in Table 2. Corrected item to total subscale-score correlations are also shown in Table 2.
Mean RADS-2 total score for the whole clinical sample was 76.30 (SD ¼ 18.26), median 78.00 (IQR ¼ 26.00). Significant sex differences were found, with females scoring higher than males on all subscales and the total scale, see Table 2 for details. The only significant age-difference was that the 12-17-year-olds scored higher than the 18-22-yearolds on the Anhedonia/Negative affect subscale, see Table 2 for details. In the normative sample, the RADS-2 total score was significantly lower than in the clinical sample, with normative sample mean 59.61 (SD ¼ 15.79), median 58.00, n ¼ 588, total n ¼ 1124, Mann-Whitney U ¼ 237663.00, p < 0.001. Each of the four individual subscale scores was also lower in the non-clinical sample, U-value range 236865.00 À 244185.50, all at p < 0.001.

Factor structure
Standardized factor loadings for all RADS-2 items are presented in Table 3, both by age-group (12-17 and 18-22) and sex categories, and for the whole sample. In the whole sample standardized factor loading range was 0.34 (item 21) to 0.96 (item 20).

Reliability and measurement invariance
Reliability measures for all subscales as well as for the total scale were acceptable to good, see Table 4.
Individual CFAs in the different age and sex-groups are presented in Table 5. Configural CFAs in the age, sex, and clinical/non-clinical sample groups resulted in acceptable fit indices, with the exception of RMSEA, supporting configural invariance. Invariance with regards to (1) thresholds, (2) thresholds and factor loadings, and (3) thresholds, factor loadings, and intercepts was also demonstrated between age and sex-groups, as well as between the clinical and non-clinical samples (although some scaled v 2 difference tests were significant and one DCFI value touched the specified cut-off), see Table 5 for details.

Validity
Spearman's correlation coefficients between RADS-2 total scale and RADS-2 subscales, as well as between RADS-2 total scale and validation-instruments are shown in Table 6. The internal correlations of RADS-2 ranged from 0.52 (Anhedonia/negative affect and Somatic complaints subscales) to 0.94 (Negative self-evaluation subscale and total Table 2. Means and standard deviations for RADS-2 items, subscale scores and total scores by age-group, sex, and for the whole sample.  Corrected item to total subscale-score correlations.

Discussion
In this study, we aimed to test the psychometric properties of the Swedish version of RADS-2, an internationally established measure of depression, in a clinical sample. Acceptable fit indices in the CFA supported the four-factor structure demonstrated in previous studies [11,13,16]. We draw this conclusion despite RMSEA not reaching the preferred cut-off value, as all other fit indices did. Also, the SRMR is more accurate than RMSEA for model fit evaluation when data is ordinal and when item-level non-normality is identified [54], as was the case in our sample. The RADS-2 McDonald's Omega ranged from 0.79 to 0.89 which is in line with what has been found previously [10], indicating that the scale is reliable also in this context. We tested four levels of measurement invariance for both sex (males and females), age-group (12-17 years and 18-22 years), and for the clinical/non-clinical samples. The presence of some significant Satorra-Bentler scaled v 2 difference tests is, strictly interpreted, indicative of measurement non-invariance on that comparison. This is however a sample size-sensitive test and caution is needed to not mistakenly reject null-hypotheses of model equivalence. Across all levels, there were only minimal changes in RMSEA, well within the acceptable boundaries. The changes in CFI were also small; only at the scalar level for clinical-nonclinical samples did the change in CFI touch the recommended threshold (À0.002). Given that this threshold is highly conservative based on having many groups (10)(11)(12)(13)(14)(15)(16)(17)(18)(19)(20), whereas in our case we had only two, and considering the delta-RMSEA results; it was considered in general that measurement invariance was demonstrated. RADS-2 is therefore suitable for use in both males and females; in both 12-17-and 18-22-year-olds, and in Table 3. Standardized factor loadings for RADS-2 items, by age-group, sex, and for the whole sample.  clinical as well as clinical populations, and comparisons of latent mean-scores between these populations are valid.
As expected the RADS-2 scores were higher in the present clinical sample than in the previously published normative sample [20]. Sex-differences were found, with females scoring generally higher than males. We interpret this as a result of the female sample containing more individuals with primary affective diagnoses, reflecting the diagnosis-distribution in the general population [55]. Indeed, the presence of a primary diagnosis of an affective disorder was more common in the females than in the males in our sample, Chi 2 (1 DF, n ¼ 223) ¼ 7.78, p ¼ 0.005, supporting this conclusion. Convergent and discriminative validity were demonstrated with correlations using scales of similar and different constructs.
An unexpected finding was the low correlation between RADS-2 and PROMIS physical activity (À0.18), as associations between depression and physical inactivity have been previously shown [56,57]. In individuals from the present agerange there seems to be poor agreement between self-reported and objectively measured physical activity [58], and self-report bias has been reported particularly in the presence of mental health problems [59].
Another unexpected finding was the low factor loading on item 21 (0.34, see Table 3) rating self-pity: 'I am feeling sorry for myself', a trend that was particularly strong in the females and in the older age-group. This was not seen in our normative sample study [20] and it is therefore unlikely to be culturally or translation-related. It is possible that some individuals with depressive symptoms do not find themselves worthy of love and compassion [60], and not valued enough to feel sorry for. Supporting this is the finding that self-compassion is lower in adolescent girls compared to boys, and lower levels correlate with depression ratings [61].

Limitations and strengths
One limitation of this study is that participation rates were low and therefore the extent to which our sample Table 5. Measurement invariance goodness-of-fit for the four-factor model of RADS-2, presented with separate CFAs for sex and age-group, as well as invariance models for sex, age-group, and the clinical/non-clinical sample.  confidently represents an unselected clinical population is unclear. Most of the participants were recruited from Child and Adolescent Psychiatry, which potentially limits generalizability beyond this context. Also, regarding external validity, only one fourth of the participants were male. This was expected given that internalizing symptoms are less prevalent in young males compared to young females [62], and the sex-distribution in our sample roughly corresponds to that of previous studies [55]. Measurement invariance analysis supported the hypothesis that the scale performs equally well in both sexes, which reduces the impact of the unequal sex distribution in this sample. Another limitation was that only self-rating was performed. To compare self-rating scores with clinician ratings would have improved the validity-analyses, and clinical assessments would also have increased the reliability of the psychiatric diagnoses that were now extracted from participants' medical records. The low frequency of affective diagnoses in the current sample is a limitation that was likely caused by under-reporting or lack of diagnostic routines. More precise diagnosis-data would have enabled the computation of receiver operating characteristic curves to suggest optimal cutoff-scores for clinical caseness. The limitations of our diagnosis-data as well as the broad inclusion criteria need to be kept in mind when interpreting all results of the study. A potential weakness of RADS-2 is that the scale does not capture all dimensions of depression postulated by the DSM-5, for example attention deficit is not explicitly measured. It is possible that symptoms that are less specific to depression as compared to depressed mood and diminished interest/pleasure [6] have been omitted from the RADS-2 for that reason.
In terms of analysis, validity analyses with simple correlations return only the relationship between variables without quantifying the agreement between the two. More general disadvantages of classical test theory have been elaborated elsewhere [63] and modern item response theory is increasingly being used to evaluate clinical measures in a more sophisticated manner [64]. Therefore, as a future direction we suggest using item-response modeling to evaluate the psychometric properties of RADS-2, even though the dimensionality of the scale would then have to be discarded.
Strengths of this study include the recruitment of patients from a rural to university town area, as well as from different levels of care. Recruiting a mixed clinical sample from an extended age-range including both teenagers and young adults increases the applicability of the scale. This is advantageous given the high frequency of comorbidity [65] and supports future research spanning the age-range from adolescence to young adulthood. As measurement invariance holds for RADS-2 in the whole age-range of this study population, it will be possible to study latent means in this extended population in future studies [47].

Conclusions
RADS-2 displayed good psychometrical properties in the current sample, with supported factor structure, acceptable to good reliability, good validity, and measurement invariance, supporting the view of RADS-2 being a reliable and useful instrument. We conclude that the Swedish version of RADS-2 may be used by clinicians to evaluate symptoms of depression, and by researchers for observational and experimental purposes.