An analysis of correlations among four outcome scales employed in clinical trials of patients with major depressive disorder

Background The 17-item Hamilton Depression Rating Scale (HAM-D17) remains the 'gold standard' for measuring treatment outcomes in clinical trials of depressed patients. The Montgomery Ǻsberg Depression Rating Scale (MADRS), Clinical Global Impressions-Severity (CGI-S) and -Improvement (CGI-I) scales are also widely used. Objective This analysis of data from 22 double-blind, placebo-controlled clinical studies of venlafaxine in adult patients with major depressive disorder was aimed at assessing correlations among these 4 scales. Methods Changes from baseline for MADRS, HAM-D17 and CGI-S, and end point CGI-I scores and response (≥50% decrease from baseline HAM-D17 or MADRS, or CGI-S or CGI-I score ≤2) were analysed. Pearson correlation coefficients were calculated for all pairs of the four scales (HAM-D17/MADRS, HAM-D17/CGI-S, HAM-D17/CGI-I, MADRS/CGI-S, MADRS/CGI-I, CGI-S/CGI-I) at different time points. Effect sizes were calculated using the Cohen d. Results Correlations were significant at all time points (p < 0.0001), increased over the course of treatment, and were similar across treatment groups. Effect sizes ranged from 0.31 to 0.42; MADRS and CGI-I effect sizes were slightly greater compared with HAM-D17 or CGI-S for continuous measures and response. Conclusion Although MADRS and CGI-I were more sensitive to treatment effects, HAM-D17, MADRS, CGI-S and CGI-I scores present a consistent picture of response to venlafaxine treatment.


Background
Many instruments have been developed to measure outcomes in studies of patients with major depressive disorder (MDD). Among them, the Hamilton Depression Rating Scale (HAM-D) [1], the Montgomery sberg Depression Rating Scale (MADRS) [2], and the Clinical Global Impressions-Severity scale (CGI-S) and -Improvement scale (CGI-I) [3], are investigator-rated instruments; the CGI-I differs from the other three scales in that it assesses the degree of symptom improvement rather than absolute severity of symptoms or specific pathology [3]. The HAM-D and the MADRS scales measure depressive symptoms, whereas the CGI-S and CGI-I assess global outcome.
The HAM-D was developed in the 1950s to evaluate efficacy of first-generation antidepressants; the 17-item HAM-D (HAM-D 17 ) has been accepted by many as the standard for measuring therapeutic efficacy in clinical tri-als [1]. However, one problem with the HAM-D is that individual items are often multidimensional, with poor inter-rater and retest reliability. As a result, the HAM-D total score can be ambiguous [4]. The MADRS was designed to address some of the limitations of the HAM-D. Specifically, the MADRS may be more sensitive to treatment-related changes in depression and may better distinguish responders from non-responders [2,5]. Recent analyses have confirmed the correlation between HAM-D, MADRS, and CGI-S in a systematic literature review and two retrospective chart reviews [4][5][6].
The present analysis was undertaken in a large dataset of 22 double-blind, placebo-controlled, clinical studies of venlafaxine in patients with MDD to identify and assess correlations among these 4 widely-used, rating scales: the HAM-D 17 , MADRS, CGI-S, and CGI-I.  [10,23], and one study (360) enrolled patients with concomitant anxiety [21]. Study durations ranged from 4 weeks to 52 weeks.

Statistical analysis
Continuous outcomes were defined as total change from baseline for MADRS and HAM-D 17 , change in score from baseline for CGI-S, and end point scores for CGI-I. These scores were calculated using observed data for the total patient populations at weeks 1, 2, 3, 4, 6, and 8 (for studies less than 8 weeks in duration, data were included for the number of weeks available), and for the final on-therapy (FOT) visit. HAM-D 17 , MADRS, CGI-S, and CGI-I  The four scales also were used to determine binary outcomes (response or no response). For CGI-I and CGI-S, response was defined as scores ≤2, and for HAM-D 17 and MADRS total scores, response was defined as a 50% or greater decrease from baseline. Pearson correlation coefficients were determined for all possible pairs of the four scales for binary outcomes at weeks 1 through 8. Correlations were calculated for the FOT scores for the total population, and separately for those in the venlafaxine and placebo arms.
Pearson product-moment correlation coefficient (r), a measure of the tendency of two variables to increase or decrease together, was used to measure the correlation of a pair of two efficacy variables measured on the same sub-ject. Effect sizes (Cohen d) were calculated to measure the magnitude of the treatment effect at the FOT evaluation for the pooled data and individually for each study. statistically similar for the total population, the venlafaxine group, and the placebo group.

Results
Correlation coefficients between binary outcomes (that is, response) were lower, ranging from 0.42 (CGI-I and CGI-S) to 0.61 (HAM-D 17 and MADRS) at week 1 and from 0.61 (CGI-I and CGI-S) to 0.81 (HAM-D 17 and MADRS) at week 8 ( Figure 3). The correlations between binary outcomes at the FOT visit ranged from 0.68 (CGI-I and CGI-S) to 0.82 (MADRS and HAM-D 17 ) (Figure 4). All correlation coefficients were significant at all data points (p < 0.0001).
Pooled effect sizes for the continuous outcomes ranged from 0.39 on the CGI-I to 0.42 on the CGI-S ( Figure 5). Effect sizes for the binary outcomes were lower, ranging from 0.31 (CGI-I response) to 0.41 (CGI-S response).
Although differences were small, MADRS and CGI-I were better able to detect differences between venlafaxine and placebo than HAM-D 17 or CGI-S for both sets of outcomes. Effect sizes across the individual studies varied considerably, but the pattern of results was largely consistent with that of the pooled data. In the majority of studies,

Discussion
The data presented here, which are derived from a large pooled dataset from 22 clinical trials, confirm and expand results of earlier comparisons of these 4 commonly used depression rating scales [4][5][6]. Previous analyses have included data from samples that were smaller and rather homogeneous in terms of baseline depression severity and duration of treatment; these analyses evaluated treatment effects with a variety of antidepressants, including tricyclic antidepressants, selective serotonin reuptake inhibitors, and serotonin-norepinephrine reuptake inhibitors [5,6]. The trials in this analysis all included patients with MDD. However, the diagnostic criteria differed according to the DSM criteria accepted at the time individual studies were designed. All studies in this analysis used venlafaxine; however, they differed in the venlafaxine formulation used, dosing regimens (fixed or flexible), and duration of study treatment. The variability among the studies analysed here did not appear to confound the results, as the observations made using the HAM-D 17 , MADRS, CGI-S, and CGI-I were highly correlated. Furthermore, despite the differences between this and other analyses, the findings are consistent [6]. As might be expected, the highest correlations were between the HAM-D 17 and the MADRS rating scales, which share several items, have similar modes of administration and rating, and are generally performed by the same clinician. However, in some clinical trials, depression rating assessments and assessments of global illness severity or improvement may be performed by different clinicians; this may have contributed to the lower correlations between the HAM-D 17 or MADRS scales and the CGI scales observed in this analysis. The consistently and modestly lower correlations between the CGI-S and CGI-I scales were unexpected as these scales are sometimes considered interchangeable. However, this may be explained by the relatively narrow distribution of the score range (1 to 7) compared with the ranges for the HAM-D 17 and MADRS total scores.
Although they were significant, correlation coefficients among binary outcomes based on the scales were lower than those for the change from baseline or FOT scores. Moreover, effect sizes were smaller for all scales in measuring the binary outcomes. These differences may be related to the definitions of response or no response that were used for the different scales. Some patients may have experienced significant improvement, which would be reflected in the change from baseline, although the scores did not meet the threshold for response.

Conclusion
Overall, these results suggest that HAM-D 17 , MADRS, CGI-S, and CGI-I scores present a consistent picture of response to antidepressant therapy with venlafaxine.