Reliability, validity and psychometric properties of the Greek translation of the Major Depression Inventory

Background The Major Depression Inventory (MDI) is a brief self-rating scale for the assessment of depression. It is reported to be valid because it is based on the universe of symptoms of DSM-IV and ICD-10 depression. The aim of the current preliminary study was to assess the reliability, validity and psychometric properties of the Greek translation of the MDI. Methods 30 depressed patients of mean age 23.41 (± 5.77) years, and 68 controls patients of mean age 25.08 (± 11.42) years, entered the study. In 18 of them, the instrument was re-applied 1–2 days later and the Translation and Back Translation made. Clinical diagnosis was reached with the use of the SCAN v.2.0 and the International Personality Disorders Examination (IPDE). The Center for Epidemiological Studies-Depression (CES-D) and the Zung Depression Rating Scale (ZDRS) were applied for cross-validation purposes. Statistical analysis included ANOVA, the Spearman Product Moment Correlation Coefficient, Principal Components Analysis and the calculation of Cronbach's α. Results Sensitivity and specificity were 0.86 and 0.94, respectively, at 26/27. Cronbach's α for the total scale was equal to 0.89. The Spearman's rho between MDI and CES-D was 0.86 and between MDI and ZDRS was 0.76. The factor analysis revealed two factors but the first accounted for 54% of variance while the second only for 9%. The test-retest reliability was excellent (Spearman's rho between 0.53 and 0.96 for individual items and 0.89 for total score). Conclusion The current study provided preliminary evidence concerning the reliability and validity of the Greek translation of the MDI. Its properties are similar to those reported in the international literature, but further research is necessary.


Background
The Major Depression Inventory (MDI) [1] is a brief selfrating scale for the assessment of depression. Self-report instruments like the Beck Depression Inventory [2] and the Zung Depression Rating Scale (ZDRS) [3], are frequently used for the measurement of depression.
However, most of these instruments are old, and had been developed during the pre-DSM-III era. Thus, they reflect a rather old concept concerning what depression is and how it should be rated. These scales are supposed to be used as screening tools rather and not as substitutes for an in-depth interview [4]. However, during the recent years, several instruments developed, like the Beck Depression Inventory-2, the Inventory to Diagnose Depression and others, in accord with the DSM-IV criteria.
The Major Depression Inventory (MDI) is reported to be valid [5,6] because it is based on the universe of symptoms of DSM-IV and ICD-10 moderate to severe depression. The MDI items are measured in frequency using the past two weeks as the time frame. It contains, in principle, 12 items, as item 8 and item 10 each have two sub-items, a and b. Therefore, in total, the MDI contains 12 items. Functionally, though, the MDI has only 10 items as it is only the highest score of either a or b that count in items 8 and 10. Each scoring of the 10 items ranges from 0 (at no time) to 5 (all of the time). Thus, the total score range is 0-50, and there are no sub-scores. The MDI can be scored also in an algorithmic way. In order to diagnose major depression, the symptoms should have been present nearly every day during the past 2 weeks. The MDI is based on the DSM-IV and ICD-10 definitions of depression and a score of 4 or more on an item (that is, most of the time) qualifies for the algorithm of ICD-10 or DSM-IV. The ICD-10 algorithms of the MDI for moderate depression are a score of 4 or 5 on two of the three top items and on at least four of the remaining items. The DSM-IV algorithm of the MDI for major depression is a score of 4 or 5 on five of the nine items (item 4 being excluded), but at least one of these five items must be either depressed mood or loss of interest [7].
Both scoring systems were tested in the current study.
The aim of the current preliminary study was to assess the reliability, validity and psychometric properties of the Greek translation of the Major Depression Inventory (MDI).
Patients were free of any medication for at least two weeks and were physically healthy with normal clinical and laboratory findings (Electroencephalogram, blood and biochemical testing, thyroid function, test for pregnancy, Â12 and folic acid B12).
Patients came from the inpatient and outpatient unit of the 3 rd Department of Psychiatry, Aristotle University of Thessaloniki, General Hospital AHEPA, Thessaloniki, Greece. They were consecutive cases and were chosen because they fulfilled the above criteria.
The normal controls group was composed by members of the hospital staff, and students. A clinical interview confirmed that they did not suffer from any mental disorder and their prior history was free from mental and thyroid disorder. They were free of any medication for at least two weeks and were physically healthy.
All patients and controls provided written informed consent before participating in the study.

Method
Translation and Back Translation was made by two of the authors; one of whom did the translation and the other who did not know the original English text, did the back translation. The final translation was fixed by consensus of all authors.
Clinical Diagnosis was reached by consensus of two examiners. The Schedules for Clinical Assessment in Neuropsychiatry (SCAN) version 2.0 [10,11] and the International Personality Disorders Examination (IPDE) [12][13][14][15] were used. Both were applied by one of the authors (KNF) who has official training in a World Health Organization Training and Reference Center. The IPDE did not contribute to the clinical diagnosis of depression, but was used in the frame of a global and comprehensive assessment of the patients. The second examiner performed an unstructured interview.
The Center for Epidemiological Studies-Depression (CES-D) [16] and the Zung Depression Rating Scale (ZDRS) [17] was applied to the subjects for purposes of cross-validation. The clinical diagnosis was used as the 'gold standard' for the validation of the MDI. The use of a semistructured interview strengthens this approach, which however has certain inherent limitations.

Statistical Analysis
Analysis of Variance (ANOVA) [18], was used to search for differences between groups. Principal Components Analysis (without and after Varimax Normalized Rotation) was performed, and factor coefficients and scores were calculated.
Item Analysis [19] was performed, and the value of Cronbach's α for MDI was calculated. Receiver Operator Characteristic Curves (ROC curves) and histogram of frequencies were created as well.

Reliability assessment (test-retest)
The Spearman Rank Correlation Coefficient (rho) was calculated to assess the test-retest reliability. However, the calculation of correlation coefficients is not a sufficient method to test reliability and reproducibility of a method and its results, because it is an index of correlation and not an index of agreement [18,20,21]. The calculation of means and standard deviations for each MDI item and total score during the 1 st (test) and 2 nd (retest) applications may provide an impression of the stability of results over time (table 4).
Also, the means and the standard deviations of the differences concerning each MDI item between test and retest were calculated and the plots of the test vs. retest and difference vs. average value for each variable were created. In fact it is not possible to use statistics to define acceptable agreement [18]. However these plots may assist decision. It is not possible to show all of these plots, but the respected concerning the total MDI score is shown in figures 2 and 3. This method was used in previous studies concerning the validation of scientific methods [22]. Also, the module of 'Process Analysis Gage Repeatability and Reproducibility' of the Statsoft-Statistica was used to further investigate the repeatability of the MDI with the use of Analysis of Variance (ANOVA) [23]. The purpose of this analysis is to determine the proportion of measurement variability that is due to; 1. the subjects being assessed, 2. the MDI items (method) used for the measurement, 3. the trials (in our case: test vs. retest).
In the ideal case, only a negligible proportion of the variability will be due to trial-to-trial repeatability.

Results
The calculation of sensitivity (Sn) and specificity (Sp) at various cut-off levels showed that the optimum combination was 0.86 for Sensitivity and 0.94 for specificity at the score level 26/27, with 64 controls and 26 patients correctly classified. This means that a subject with a score equal to 26 is classified as normal while a subject with a score equal to 27 is classified as depressed. Four controls and 4 patients were classified into a wrong diagnostic group (table 1).
Cronbach's α for the total scale was equal to 0.89. This is a very high value, suggesting that the MDI scale reflects a single structure.
The histogram of MDI scores in control subjects reveals that they do not follow the normal distribution in this population, but rather manifest a skewness towards lower values (figure 1).
The MDI total score correlated highly with both the CES-D and the ZDRS. Spearman correlation coefficient was equal to 0,86 concerning the CES-D and 0.76 concerning the ZDRS.
The factor analysis of cases revealed the presence of two factors (table 2). The results before and after varimax normalized rotation were pretty much the same. The first factor includes all items but No 10B, and explains 54% of variability. The second one includes only item No 10B (increased appetite) and explains only 9% of variability. Thus, in essence there is a single-factor solution (table 2).
Depressed patients did not differ from controls in age. On the contrary they differed in every MDI individual item score and total score (p < 0.001 -table 3, see additional file 1), except of item No 10B (increased appetite).
The test-retest reliability proved to be satisfactory. Individual items had good Spearman rank correlation coefficients with the lower value concerning item No 7 (difficulty with concentration, R = 0.53) and the higher concerning item No 8A (restlessness, R = 0.96). The coefficient for the total MDI score was very good and equal to 0.91 (table 4). The bivariate scatterplot between the test and retest values of the total MDI score (figure 2) suggests

Figure 1
Histogram of the distribution of the total MDI scores in normal subjects that the total MDI score is a reliable variable, since the points of the test-retest plot are very close to the regression line (which is a dichotomous). Also, the bivariate scatterplot of the differences between measurements vs. the average value of measurements concerning the total MDI score (figure 3) suggests that the total MDI score is a reliable variable, since all but 2 of the points of the difference vs. average are within the 2 standard deviation range from the mean difference.
The comparison between the values obtained during test vs. those obtained during retest revealed no differences (table 4). The ANOVA results (table 5) suggest that only 0.01% of total variance is due to differences inherent to the test-retest procedure, with an additional 2.16% attributable to the interaction of this procedure with subjects and the test itself. However this is obviously a very low rate.
The use of DSM-IV and ICD-10 algorithms revealed very poor performance. Their use is not recommended for Greek patients. Most patients (although clinically diagnosed) had fewer than 5 DSM criteria (24 patiens with less than 5 vs. 6 with more than 5) and fewer than 4 ICD criteria (28 patiens with less than 4 vs. 2 patients with more than 4) registered with the use of algorithms. Two control subjects fulfilled DSM-IV criteria and 3 controls fulfilled ICD-10 criteria for depression according to the same algorithms.

Discussion
The present study is a preliminary effort to obtain data concerning the psychometric properties of the Greek translation of the MDI. The fact that results are only preliminary should be stressed out, because there is a need for further study of the properties of the scale in larger and more representative samples.
The use of self-report scales is frequent in psychiatric research. However, it is also well known that this kind of scales heavily depend on the co-operation and reading ability of the patient. It is also known that their performance is influenced by the theoretical background of their development. On the other hand they save time for the clinician. The MDI is a new self-rating scale for depression both in community and clinical settings and literature concerning its transcultural reliability and validity is limited. The current study reports observations on the reliability, the validity and the psychometric properties of the Greek translation of the MDI. The results suggest that this translation is well suited for use in the Greek population with high sensitivity and specificity at the cut-off level 26/27, high test-retest reliability and high internal consistency. Its factor structure is similar to that reported in the literature.
In order to diagnose major depression, the symptoms should have been present nearly every day during the past 2 weeks. The MDI is based on the DSM-IV and ICD-10 definitions of depression (and this is its major advantage over other older and wide-spread instruments) and a score of 4 or more on an item (that is, most of the time) qualifies for the algorithm of ICD-10 or DSM-IV. The ICD-10 algorithms of the MDI for moderate depression are a score of 4 or 5 on two of the three top items and on at least four of the remaining items. The DSM-IV algorithm of the However, the use of algorithms is not supported by the results of the current study. The performance of these algorithms was very poor. The reason for this is unknown, but it may imply that the subjects score the MDI items in a way closer to a 'visual analog scale' and do not really follow instructions.
The reliability and validity of the MDI has been tested in a limited number of studies and no translation of this scale has been published. This is in contrast to the large literature concerning the ZDRS [17,[24][25][26][27] or the CES-D [16,[28][29][30][31][32].
The use of the MDI in one large population study [33] revealed a favorable profile for the scale. The comparative study of the ZDRS and MDI in 89 patients with Parkinson's disease [7] suggested that the MDI is superior to the ZDRS. That study reported only one general factor for the MDI, that explained 58.3% of variance (54% in the current study), and higher coefficient of homogeneity in comparison to the ZDRS. The Cronbach's α for the total scale was equal to 0.92.
A typical standardization study [1] reported a Cronbach's α for the total scale equal to 0.94, which is comparable to the 0.89 reported in the current study. In that study, the sensitivity was equal to 0.90 and the specificity was 0.82 when the MDI algorithms were used for the DSM-IV diagnosis (this is in sharp contrast with the results of the current study). When the cut-off point of 26/27 was used, then the sensitivity was equal to 1.00 (0.86 in the current one) and the specificity to 0.82 (0.94 in the current one).
Review studies on various self-administered instruments suggest that there is no significant difference between them in terms of performance and overall sensitivity is around 0.84 and specificity around 0.72 [34]. These instruments are of particular value in primary care settings because it is clear that primary care providers fail to diagnose and treat as many as 35% to 50% of patients with depressive disorders [35,36]. Depression is one of the most common psychiatric diagnoses in primary care populations [37]; major depressive disorders can be diagnosed in 6% to 9% of such patients. Obstacles to the appropriate recognition of depression include inadequate provider knowledge of diagnostic criteria; competing comorbid conditions and priorities among primary care patients; time limitations in busy office settings; concern about the implications of labeling; poor reimbursement mechanisms; and uncertainty about the value, accuracy, and efficiency of screening mechanisms for identifying patients with depression. Given that 50% to 60% of persons seeking help for depression are treated exclusively in the primary care setting, accurate detection in this setting is important [38] and self-administered instruments may help to ameliolate some of them.
On the other hand, it should be noted that the diagnosis of depression is itself based on symptoms. A patient can-

Figure 2
Bivariate scatterplot between the test and retest values of the total MDI score. The plot suggests that the total MDI score is a reliable variable. The points of the test-retest plot are very close to the regression line (which is a dichotomous)

Figure 3
Bivariate scatterplot of the difference between measurements vs average value of measurements concerning the total MDI score. The plot suggests that the total MDI score is a reliable variable. All but 2 of the points of the difference vs. average are within the 2 SD from the mean difference.  not truly be asymptomatic and have major depressive disorder. Thus, these screening questionnaires are actually being evaluated for their ability to detect unrecognized, rather than strictly asymptomatic, depressive symptoms and disease. They are also useful for the assessment of severity but not for the diagnosis per se.
It should be also stressed that the current study offers only preliminary data. The study sample is small; retest data are available for only 18 subjects and the factor analysis included both patients and controls. The complete validation demands the application of the scale in larger samples and more sophisticated methodology, including the use of borderline severity samples.

Conclusion
The Greek translation of the MDI scale is both reliable and valid and is suitable for clinical and research use with satisfactory properties. Its properties are similar to those reported in the international literature. However one should always have in mind the limitations inherent in the use of self-report scales. They suggest that only 0.01% of total variance is due to differences inherent to the test-retest procedure, with an additional 2.16% (0.54+0.04+1.58) attributable to the interaction of this procedure with subjects and the test itself. This is obviously a very low rate.