Evaluating the psychometric structure of the Hamilton Rating Scale for Depression pre-and post-treatment in antidepressant randomised trials: Secondary analysis of 6843 individual participants from 20 trials

Background: The 17-item Hamilton Rating Scale for Depression (HRSD-17) is the most popular depression measure in antidepressant clinical trials. Prior evidence indicates poor replicability and inconsistent factorial structure. This has not been studied in pooled randomised trial data, nor has a psychometrically optimal model been developed. Aims: To examine the psychometric properties of the HRSD-17 for pre-treatment and post-treatment clinical trial data in a large pooled database of antidepressant randomised controlled trial participants, and to determine an optimal abbreviated version. Method: Data for 6843 participants were obtained from the data repository Vivli.org and randomly split into groups for exploratory ( n = 3421) and confirmatory ( n = 3422) factor analysis. Invariance methods were used to assess potential sex differences. Results: The HRSD-17 was psychometrically sub-optimal and non-invariant for all models. High item variances and low variance explained suggested redundancy in each model. EFA failed at baseline and produced four item models for outcome groups (five for placebo-outcome), which were metric but not scalar invariant. Conclusions: In antidepressant trial data, the HRSD-17 was psychometrically inadequate and scores were not sex invariant. Neither full nor abbreviated HRSD models are suitable for use in clinical trial settings and the HRSD ’ s status as the gold standard should be reconsidered.


Introduction
The Hamilton Rating Scale Depression (HRSD) (Hamilton, 1960) is the most widely used clinician-administered assessment instrument in randomised trials of antidepressant medications (Bagby et al., 2004;Carrozzino et al., 2020).Originally conceived as a 17-item composite scale, the HRSD is currently available in versions ranging from 7 to 31 items in length and has also been published in multiple languages.Typically, only the initial 17 items of the HRSD are used in antidepressant clinical trials.These 17 items are specifically designed to measure melancholic and physical symptoms of depression and vary in their scale of measurement, with eight items measured on a three-point scale (usually from 0 to 2) and nine items measured on a five-point scale (usually from 0 to 4).While it has long been considered the 'gold standard' for the assessment of depression in randomised clinical trials (Bagby et al., 2004), the HRSD predates contemporary conceptualisations of depression, as defined in the fifth edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-V) (American Psychiatric Association, 2013).Perhaps more significantly, studies have struggled to reconcile the validity, reliability and factorial structure of this test instrument.For example, a review of 70 studies conducted by Bagby et al. (2004) found that, while internal reliability, convergent validity and discriminant validity tended to be adequate, the factor structure was widely inconsistent, with many scale items exhibiting poor content validity and making little contribution to the measurement of depression.These studies were found to have identified as few as two factors (Steinmeyer and Möller, 1992) and as many as eight (O'Brien and Glaudin, 1988).Inconsistencies in the instrument's factor structure persist among more contemporary studies with, for example, Desseilles et al. (2012) identifying a three-factor model and Broen et al. (2015) identified a unidimensional six-item model.In addition, with few exceptions (Obeid et al., 2023), there appears to be have been limited assessment of measurement invariance by sex.To our knowledge, no previous research has used pooled individual participant data from multiple clinical trials to explore the factor structure of the HRSD-17 both pre-and post-antidepressant treatment.The primary aim of this study was to examine the psychometric properties of the HRSD-17 for antidepressant clinical trial participants at baseline (pre-treatment) and outcome (post-treatment), as well as individually for placebo and treatment groups at outcome (post-treatment), and assess measurement invariance by sex.A secondary aim was to determine if an optimal abbreviated version of the HRSD-17 could be found.

Method
This study analysed secondary data obtained from the online data repository 'Vivli.org' (2024).Ethical approval for this study was obtained from the Royal College of Surgeons in Ireland Research Ethics Committee (Record ID: 212560819).Consent was not a requisite for this study, as analyses were conducted using secondary data and participant consent for the use of their data in research subsequent to the original study was obtained by respective study sponsors.

Inclusion and exclusion criteria
A list of available studies that used the HRSD-17 was prepared.Studies that used longer versions of the HRSD were included in a search of the Vivli.orgrepository, however, only data for the first 17 items were retained for analyses.Inclusion criteria were established (Doyle et al., 2023), which specified phase II, III and IV randomised controlled trials of adult (18+ years old) participants being treated for any major or minor depressive disorder using any antidepressant medication with a placebo comparator.Assessment intervals used for analyses were baseline and eight weeks (or within a range of 4-12 weeks) for outcome, as per Cipriani et al. (2018).Exclusion criteria were all studies outside of these parameters.

Datasets
A total of 20 studies were eligible for inclusion (Supplementary Table 1).After screening and cleaning using listwise deletion of missing values, an overall sample of n = 6843 was achieved.To facilitate exploratory and confirmatory factor analysis, the rand() function in Microsoft Excel was used to randomly divide the data into EFA and CFA groups (Doyle et al., 2023).Randomisation was performed on a study-by-study basis according to treatment groups to ensure that the EFA and CFA data-groups comprised an equal representation of placebo and treatment group patients from each study.This resulted in an EFA dataset sample of n = 3421 and a CFA dataset sample of n = 3442.Table 1 illustrates the characteristics of the overall sample, as well as the individual EFA and CFA datasets, according to age, sex and treatment received.Supplementary Table 2 provides the distribution of depression severity ratings for the baseline and outcome groups in relation to both the EFA and CFA datasets.

Analysis
Analyses were conducted using R v4.1.1 (R Core Team, 2013).Uniand multivariate normality were assessed by conducting Shapiro-Wilk tests and Henze-Zirkler's tests, respectively, using the MVN (v5.9) package (Korkmaz et al., 2022).The factorability of each data group was determined using a Kaiser-Meyer-Olkin (KMO) measure of sample adequacy (MSA) -which is generally accepted to have a minimum threshold of 0.60 (Watson, 2017) and Bartlett's test of sphericity.
The HRSD-17 is most commonly used in clinical trials as a unidimensional scale, with outcome analyses pertaining to sum scores of all 17 items (Carrozzino et al., 2020).Therefore, an initial confirmatory factor analysis was conducted to assess the psychometric properties of this assumed unidimensional 17-item model.This was followed by a series of exploratory and confirmatory factor analyses to determine optimum dimensionality and factor structure.Parallel analysis (PA) was conducted using the 'fa.parallel' function of the psych (v2.1.6)package (Revelle, 2022) to determine the dimensionality of the baseline and outcome groups using the EFA data.The factor retention criterion was considered in relation to the number of actual data eigenvalues that were larger than their simulated data counterparts (Scott et al., 1995).The 'fa' function in psych was used to conduct exploratory factor analysis, which adopted maximum likelihood estimation and an oblimin rotation method.Successive iterations of EFA were conducted to remove (i) factors with fewer than three item loadings, (ii) items with factor loadings < 0.30, and (iii) items with communalities < 0.40, as recommended by Costello and Osborne (2005).This was followed by confirmatory factor analysis of the EFA-specified models, which was conducted using Lavaan (v0.6-9) (Rosseel et al., 2022), and also adopted maximum likelihood estimation and oblimin rotation.The performance of the CFA models was determined by examining absolute fit indices, including the Standardised Root Mean Square Residual (SRMR) and the Root Mean Square Error of Approximation (RMSEA), which are considered acceptable at < 0.08 (Schermelleh-Engel and Moosbrugger, 2008).Relative fit indices assessed include the Comparative Fit Index (CFI) and the Tucker-Lewis Index (TLI), which are acceptable at > 0.95 (Smith and McMillan, 2001).Reliability of optimal baseline and outcome models, as well as individual placebo at outcome and treatment at outcome models, was calculated as McDonald's Omega, which is typically considered acceptable at > 0.70 (Hayes and Jacob, 2020), using MBESS (v4.9.2) (Kelley, 2022).Additionally, invariance methods were used to examine for potential differences in models between men and women.The 'lavTestLRT' function of Lavaan was used to assess metric and scalar invariance by comparing a multi-group configural model, which calculates independent factor loadings and intercepts for male and female patients, to multi-group models where (i) factor loadings were fixed across both sexes (metric invariance), and (ii) factor loadings and intercepts were fixed across sexes (scalar invariance).Single-and multi-group models were explored.The influence of multi-group modelling on model fit was assessed by comparing absolute and relative fit indices of the multigroup model to those of the original single-group model.Criteria recommended by Chen (2007) were observed, which propose change thresholds of 0.015 for RMSEA, 0.030 for SRMR, and 0.010 for CFI.Chen does not specify a threshold for TLI.However, TLI was assessed in accordance with the same threshold as CFI (0.010), as these indices share the same cut-off point for acceptable fit.

Normality testing
Uni-and multivariate normality testing indicated that none of the groups satisfied the assumption of normality in either the EFA or CFA datasets.As such, a Satorra-Bentler chi-square correction was used to insulate χ 2 against deviations from multivariate normality, and robust test statistics were observed, where appropriate.The factorability of each group in the EFA dataset was assessed.At baseline, Bartlett's test of sphericity was significant (χ 2 (136) 4341.06,p < .001).However, the KMO test suggested the baseline data may have been subthreshold for factor analysis (MSA = 0.59).Factorability test outcomes for the combined outcome data (χ 2 (136) 23,373.81,p < .001,MSA = 0.91), as well as placebo at outcome (χ 2 (136) 7871.16,p < .001,MSA = 0.93) and treatment at outcome (χ 2 (136) 15,444.45,p < .001,MSA = 0.93), indicated that these data groups were suitable for factor analysis.

Confirmatory factor analysis of the assumed 17-item HRSD model
Confirmatory factor analyses of the 17-item HRSD were conducted for each group.For baseline and outcome groups, absolute fit indices were acceptable but relative fit indices were sub-optimal.Reliability at baseline was poor (ω t = 0.26 [.212, 0.299]) but was acceptable for combined outcome (ω t = 0.88 [.876, 0.887]), placebo at outcome (ω t = 0.88 [.873, 0.892]), and treatment at outcome (ω t = 0.85 [.839, 0.859]).The explained variance for each model was low, with baseline performing particularly poorly.There were also a number of high individual variances in each model.Standardised factor loadings, variances and intercepts, are presented in Table 2 and fit indices, reliability coefficients and explained variances are presented in Table 3.The relatively low explained variance coupled with numerous high individual variances across all models suggested that there was redundancy in the 17-item model of the HRSD.This was further examined with exploratory factor analysis.

Invariance testing of the 17-item HRSD
Invariance analyses of the 17-item HRSD indicated that, with the exception of metric invariance for the placebo at outcome group, all models were both metric and scalar non-invariant.This suggests differential model performance in measuring depression for men and women and that mean scores for men and women should not be directly compared.Invariance test statistics are presented in Supplementary Table 3 and fit indices for the multi-group configural model for each data group can be seen in Table 3. Inspection of fit indices suggests that the multi-group baseline model may perform moderately better than its single-group counterpart.For treatment at outcome, the multi-group TLI index is 0.15 higher than its single group counterpart but the CFI index is 0.13 lower.For combined outcome and placebo at outcome, fit is comparable between single-and multi-group models.Factor loadings, variances and intercepts are provided in Supplementary Table 4.

Exploratory factor analysis
As CFA of the assumed 17-item unidimensional model was problematic, EFA was conducted to explore abbreviated models.This began with EFA of all 17 items.Based on the actual data eigenvalue versus simulated data eigenvalue criterion, parallel analyses initially suggested six factors at baseline and placebo at outcome, and five factors for combined outcome and treatment at outcome (see Supplementary Fig. 1).However, in each case, it was noted that only one factor presented with an eigenvalue > 1, which is the acceptable threshold using the Kaiser criterion (Scott et al., 1995).Nevertheless, EFA began by exploring the number of factors specified by PA.Fit indices and reliability coefficients for initial 5-and 6-factor EFA models are presented in Table 4.The baseline model was found to be sub-optimal, while outcome models demonstrated acceptable fit and reliability.The baseline model retained nine items across six factors and outcome models retained between eleven and fourteen items.However, all models contained at least two factors with only one loading, as well as several items with sub-optimal communalities (see Supplementary Table 5).Further iterations of EFA were conducted to remove univariate factors and redundant items.At baseline, data reduction resulted in a unidimensional model that  4.

Confirmatory factor analysis of the abbreviated HRSD
Confirmatory factor analyses of the EFA-specified models were then conducted.As EFA failed to identify a viable model at baseline, and the four-item unidimensional model was noted to be relatively stable across outcome data groups, a CFA of this model was examined for the baseline group.While absolute fit indices at baseline were acceptable (SRMR = 0.021, RMSEA = 0.055 [.037, 0.075]), relative fit indices were suboptimal (CFI = 0.909, TLI = 0.728) and reliability was poor (ω t = 0.31 [.271, 0.341]).Fit indices and reliability for the outcome data were acceptable, although the RMSEA upper confidence intervals for both placebo at outcome and treatment at outcome were above the 0.08 threshold.Confirmatory models for each group, including standardised factor loadings, variances and intercepts, are presented in Table 5, and fit indices and reliability coefficients can be seen in Table 3.As with the 17-item model, reliability of the abbreviated baseline model was found to be poor (ω t = 0.31 [.271, 0.341]), while combined outcome (ω t = 0.85 [.844, 0.860]), placebo at outcome (ω t = 0.86 [.851, 0.876]) and treatment at outcome (ω t = 0.85 [.839, 0.859]) were acceptable.These analyses suggested that the unidimensional four-item model is viable for the outcome and treatment at outcome groups, while a five-item model provided optimal fit for the placebo at outcome group.The four-item combined outcome model demonstrated poor fit with baseline data.

Invariance testing of abbreviated HRSD
The results of invariance testing were consistent across baseline and all outcome groups in suggesting that metric invariance was supported, while scalar invariance was not, meaning item or mean scores for men and women should not be directly compared.However, it should be noted that the difference in CFI values between the configural and metric models at baseline were above the 0.010 threshold.Invariance test statistics are presented in Supplementary Table 7 and fit indices for the configural model for each data group can be seen in Table 3. Contrary to the 17-item model, both relative fit indices indicated that the single-group baseline model performed moderately better than its multigroup counterpart.Single-versus multi-group model performance was comparable across all other models.Factor loadings, variances and intercepts are provided in Supplementary Table 8.

Findings
To our knowledge, this is the first study to evaluate the psychometric performance of the HRSD-17 at both baseline (pre-treatment) and 8week (post-or peri-treatment) outcome by pooling individual data from thousands of participants across multiple antidepressant treatment trials.The results indicated sub-optimal performance of the 17-item HRSD, with inadequate relative fit indices for outcome models and all models presenting with several poor factor loadings and high item variances.Variance explained by the 17-item models was also relatively low and, with the exception of metric invariance for placebo at outcome, the HRSD-17 presented as sex non-invariant.EFA of the 17 items resulted in configurally non-invariant models of between five and six factors, retaining between nine and 14 items.Only the outcome models satisfied each fit index criteria, while all models contained items with sub-threshold communalities.Additional iterations of EFA were performed in an attempt to resolve these issues, which further highlighted the poor structural validity of the HRSD-17.This was noted at baseline in particular, as no adequate model could be found.A unidimensional four-item model offered best fit for the combined outcome (post-treatment) group and treatment at outcome group, which consisted of 'depressed mood', feelings of guilt', 'work and activities' and 'anxiety: psychotic'.A five-item model was found for the placebo at outcome group, which included the aforementioned items, as well as 'somatic symptoms: general'.These models had adequate reliability.When the outcome model was examined for baseline, it demonstrated sub-optimal fit and poor reliability.Invariance methods indicated that metric invariance was observed for all groups, while scalar invariance was absent.As such, item and mean depression scores may not be comparable between male and female patients.While the four-item model demonstrates better psychometric properties than the HRSD-17, significant inconsistencies remain, bringing into question the 'gold standard' status of the HRSD.Exploratory factor analysis failed to identify a viable model at baseline, with the exception of three sleep disturbance items that loaded on a common factor.Finding this factor supports an analysis conducted by Bagby et al. (2004) of 15 factor studies of the HRSD, which observed that these three items did indeed form a sleep disturbance factor.However, these items alone are not a valid representation of a depression construct, and, in the absence of any other factors, the EFA model at baseline was abandoned.Applying the four-item combined outcome model yielded acceptable absolute fit indices, but sub-optimal relative fit indices and poor reliability.It is noteworthy that the item variances at baseline were very high and the KMO test results suggested the sample in this group might be too heterogeneous for EFA.Absolute fit indices can be sensitive to such high variances.Heene et al. (2011) demonstrated that high unique variances can decrease RMSEA and SRMR valuessuggesting good fit -even when misspecification is present in the model.This may have informed the acceptable absolute fit indices coupled with poor CFI, subsequently hiding the degree to which the model is likely a poor fit for the data.
The alternative four-item model found was relatively consistent at eight-week outcome.In contrast to the EFA-specified baseline model, and indeed the available literature (Bagby et al., 2004;Cole et al. 2004), the sleep disturbance items did not feature in any of the outcome models.This is problematic, as sleep disturbance is among the most common and consequential residual symptoms after otherwise successful treatment of major depression (Carney et al., 2023).Another point of distinction noted in this study is the relatively small number of items retained after EFA.The number of items retained in factor analytical studies has tended to be in the high teens, with several retaining the original 17-items (Maier et al., 1985;Akdemir et al., 2001;Broen et al., 2015).However, it is apparent that across these studies, communalities are not observed as a data reduction criterion and factors with fewer than three item loadings tended to be retained, which goes against recommended practice (Costello and Osborne, 2005).While we used a liberal factor loading threshold of >0.30, which is less strict than the often-used threshold of >0.40, the small number of items retained in this study can be explained by the more stringent inclusion of the communality and factor item number criteria.Interestingly, the four-item model closely reflects the six-item HRSD-6 -or melancholia subscalenoted in previous studies (Bech et al., 1975;Lecrubier and Bech, 2007).A review conducted by Timmerby et al. (2017) noted that the six-item version of the HRSD outperformed the 17-item version in terms of scalability, transferability and responsiveness.
The potential sex difference in HRSD-17 performance is also worth comment.One prior study has assessed sex/gender invariance in relation the HRSD (Obeid et al., 2023), which used an abbreviated seven-item version of the HRSD consisting of the four items retained in the present study, as well as 'anxiety: somatic', 'somatic symptoms: general' and 'suicide'.Obeid et al., reported observing both metric and scalar invariance across sexes, which is contrary to the finding of only metric invariance in the present study.It should be noted that Obeid et al., conducted their invariance analyses using a community sample obtained via snowball sampling, and so results may not be generalisable to clinical settings.Converse to these findings, the absence of scalar invariance noted in the present study of a large multi-study clinical sample could have implications for how the HRSD could be utilised in antidepressant clinical trials, or indeed other settings.That the mean differences in the latent factor fail to fully capture the mean differences in the item shared variance suggests that male and female depression scores for the four-item HRSD may not be meaningfully compared (Putnick and Bornstein, 2016).

Implications and future research
Although the HRSD is considered the 'gold standard' for the assessment of depression, this study found that there is significant redundancy in the HRSD-17, with several items demonstrating high variances and relatively little variance explained.This was further evidenced with 13 items being removed during EFA.The HRSD-17 was also found to be metric and scalar non-invariant with regard to sex.The abbreviated four-item model found in this study should also be considered with caution.Fit indices and reliability were generally acceptable.This model also explained significantly more variance than the 17-item model and was metric invariant with regard to sex.However, it performed poorly at baseline and was found to be scalar non-invariant.The invariance issue can be addressed by adopting multi-group CFA techniques, which may be particularly pertinent if subsequent analyses intended to utilise factor scores that are informed by the confirmatory model (Brown and Harris, 2015).To this end, inspection of fit index differences between singleand multi-group models in this study suggests that adopting a multi-group approach would likely not diminish model fit.A more pressing concern is the inability of EFA to identify a viable model of depression at baseline, which raises the question as to what is in fact being measured at baseline.Indeed, poor factorial performance could be reflective of the fact that the HRSD was designed over 60 years ago, and its constituent items may not be representative of depression as currently defined in the DSM-V (American Psychiatric Association, 2013).Moreover, the same concerns regarding content validity of overly-abbreviated measures that has been raised against the HRSD-6 (Fried, 2016) can be raised against the four-item model found here.This requires further investigation.However, given the mixed results of previous validity and reliability analyses (Bagby et al., 2004), and the noted inconsistencies in factor structure (O'Brien and Glaudin, 1988;Steinmeyer and Möller, 1992;Desseilles et al., 2012;Broen et al., 2015), it is perhaps unsurprising that a viable model failed to converge with the more stringent criteria adopted for this study.
Given the inconsistent findings available in the literature, and the poor psychometric performance of the 17-item HRSD and inconsistencies inherent in the abbreviated model evident here, it may be time to abandon the HRSD in favour of alternative instruments.In this case, measures such as the Montgomery-Asberg Depression Rating Scale (Quilty et al., 2013), the Beck Depression Inventory (Beck et al., 1988), the Patient Health Questionnaire (Kroenke et al., 2001) or the Patient-Reported Outcome Measurement Information System depression scale (Nolte et al., 2019) could be subjected to multi-trial evaluation, as conducted in this study.Ultimately, the question of whether applying psychometric analyses improves or invalidates the results of antidepressant trials is an open one and is currently being addressed (Doyle et al., 2023).

Strengths and limitations
Key strengths of this study include its large multi-study sample, and the application of factor analysis at baseline and post/peri treatment, including separation of post-treatment placebo and treatment groups.Limitations include the non-probabilistic selection of included trials.
D. Byrne et al.Although the use of secondary data allowed us to achieve a large sample size, this sample is specific to the international clinical trial database, Vivli.org.Some studies used had characteristics (e.g., female patients only, geriatric patients only, unique geo-location) that might have introduced heterogeneity, or otherwise skewed the sample representativeness.The results of these analyses may not generalise to other datasources, or indeed other psychometric modelling techniques, measures or depression therapies.

Conclusion
This study noted sub-optimal psychometric performance and significant redundancy in the HRSD-17.Serious inconsistencies in models resulting from exploratory factor analyses were found.A unidimensional four-item model performed well with the combined outcome and treatment at outcome groups.However, this model performed poorly at baseline, and a five-item model was specified as optimal for placebo at outcome.The four-item model was also found to be scalar non-invariant in relation to men and women.In light of these issues, and those found in the existing literature, it is evident that the HRSD-17 is not suitable for use in antidepressant clinical trials.

Declarations
This publication is based on research using data from data contributors Takeda, Pfizer, GlaxoSmithKline and Eli Lilly and Co., that has been made available through Vivli, Inc. Vivli has not contributed to or approved, and is not in any way responsible for, the contents of this publication.

Table 1
Overall, EFA group and CFA group sample by age and sex.

Table 3
CFA fit indices and reliability coefficients for 17-and 4-item single-group and 4-item multi-group models.consisted of 'insomnia: early in the night', 'insomnia: middle of the night', and 'insomnia: early in the morning', which are the three sleep disturbance items in the HRSD.These items could theoretically be triangulated to form a viable factor, but this would not be representative of a latent depression variable, and so it was concluded that EFA of the baseline data failed to provide a valid model.For the combined outcome group (placebo and antidepressant together), data yielded a unidimensional, four-item model comprising 'depressed mood', feelings of guilt', 'work and activities' and 'anxiety: psychotic'.This was replicated in the treatment at outcome group only.The placebo at outcome group produced a unidimensional five-item model, which added 'somatic symptoms: general' to the items loading in the other outcome models.The four-item model specified for the other groups was examined for placebo at outcome, demonstrating a marginal increase in the variance explained from 0.55 to 0.58, with RMSEA inflated from 0.092 to 0.119.This increased RMSEA was potentially detrimental to CFA, and so the five-item model was retained for further analyses.At this point, scale reliability was assessed for each EFA data group.As mentioned, no viable model was identified at baseline.The omega coefficient for the combined outcome group was ω t = 0.85[.842,0.858],withplaceboatoutcome(ω t = 0.87[.854,0.878])andtreatment(ω t = 0.84 [.834, 0.855]) having similar reliability.Standardised factor loadings and communalities for each model can be seen in Supplementary Table6, and fit indices, variance explained and reliability coefficients can be seen in Table df = Degrees of freedom.srmr=Standardisedrootmean square residual.rmsea=Rootmeansquare error of approximation.tli=Tucker-Lewisindex.cfi=Comparativefit index.aic=Akaikeinformationcriterion.bic=Bayesianinformationcriterion.ωt=McDonaldsOmega.r 2 = Variance explained.χ 2 = Chi-squared.df=Degrees of freedom.srmr=Standardisedrootmean square residual.rmsea=Rootmeansquare error of approximation.tli=Tucker-Lewisindex.r 2 = Variance explained.ωt=McDonaldsOmega.*Reliabilityanalysis was not conducted at baseline, as items did not reflect a latent depression trait.D.Byrne et al.only

Table 5
CFA factor loadings, variances and intercepts.