The development and initial validation of self‐report measures of ICD‐11 depressive episode and generalized anxiety disorder: The International Depression Questionnaire (IDQ) and the International Anxiety Questionnaire (IAQ)

Abstract Background The new International Classification of Diseases came into effect in 2022 (ICD‐11; World Health Organization, 2022) and included updated descriptions and diagnostic rules for “Depressive Episode” and “Generalized Anxiety Disorder.” No self‐report measures align with these disorders so this study reports the development and initial validation of the “International Depression Questionnaire” (IDQ) and “International Anxiety Questionnaire” (IAQ). Methods Items were developed that aligned to the ICD‐11 descriptions and their performance was assessed using data from a community sample (N = 2058) that was representative of the United Kingdom adult population. Results Item response theory models indicated that the two scales were unidimensional, and the items performed well in terms of difficulty and discrimination. Estimates of internal reliability were high. Based on ICD‐11 derived diagnostic algorithms, 7.4% met requirements for ICD‐11 Depressive Episode and 7.1% for Generalized Anxiety Disorder. Conclusions The IDQ and the IAQ are short, easy to use, self‐report measures aligned to the new and updated ICD‐11 diagnostic descriptions. This study provides initial evidence that the scales produce scores that are reliable and valid.

line with the global standard for coding diagnostic health information, it is imperative that measures be available to researchers and clinicians that capture anxiety and depression symptoms, severity, and 'caseness', as per the ICD-11. It is also important as Depressive Episode is integral to the diagnosis Recurrent Depressive Disorder (diagnostic code 6A71), as well as Bipolar Type I Disorder (diagnostic code 6A60) and Bipolar Type II Disorder (diagnostic code 6A61).
Consequently, the primary aim of this study was to develop self-report measures that capture the symptoms and diagnostic requirements of ICD-11 Depressive Episode and ICD-11 Generalized Anxiety Disorder. The secondary aims were to provide preliminary evidence about the validity and reliability of these scale scores. To do so, validity was assessed by testing the dimensionality of the measures using item response theory (IRT) models, and we hypothesized that the measures would be unidimensional and provide most information above the mean of the underlying dimensions. Reliability was assessed using McDonald's omega (ω: McDonald, 1999). Additionally, prevalence estimates were produced in line with ICD-11 diagnostic requirements set forth in the ICD-11 Clinical Descriptions and Diagnostic Requirements (CDDR), and these were compared to criterion demographic (age and sex), mental health (mental health treatment seeking), and clinical (scores and 'caseness' on the PHQ-9 and GAD-7) variables.

| Participants
This study used data collected as part of Wave 6 of the COVID-19 Psychological Research Consortium (C19PRC) Study, which was established in March 2020 to assess the long-term psychological, social, and economic impact of the COVID-19 pandemic on the UK population. Briefly, at baseline (Wave 1, March 23-28, 2020), 2025 adults were recruited via the survey company Qualtrics using quota sampling methods to ensure that the sample characteristics were representative of the UK adult population with respect to age, sex, and 2019 household income. Data for Wave 6 were collected between August 6 and September 28, 2021, during which time data were collected in two stages: at Phase 1 (August 6-September 28), all participants who had previously taken part in the main strand of the C19PRC Study (at baseline or recruited during subsequent waves) were recontacted; at Phase 2 (8-28 September), new participants were recruited to match specific characteristics of the adults lost to panel attrition. This resulted in a recontacted Phase 1 sample of 1643 (51.8% retention rate) and 415 new participants recruited at Phase 2. The combined final Wave 6 sample (N = 2058) closely mirrored the characteristics of the baseline sample and was representative of the UK adult population aged 18 years and older with respect to sex, age, and household income (see https://osf.io/qv47z/) Ethical approval for the study was granted by the University of Sheffield (Ref no. 033759). The Wave 6 data used in the current study is available at: https://osf.io/qv47z/ Sample characteristics are reported in Table 1.

| Measures
The items, response format, and diagnostic algorithms of the International Depression Questionnaire (IDQ) and the International Anxiety Questionnaire (IAQ) 1 were derived directly from the ICD-11 descriptions of Depressive Episode and Generalized Anxiety Disorder. The questionnaires, and their alignment with ICD-11 descriptions, are presented in Tables 2a and 2b. The scales can be found at www.traumameasuresglobal.com/depression and www.
traumameasuresglobal.com/anxiety. For both questionnaires the instructions explicitly state the ICD-11's time criterion, which is "a period lasting at least two weeks" for Depressive Episode ("Over the last two weeks, how frequently have you had the following feelings, Have these experiences caused problems in personal, family, social, educational, occupational, or other important areas of your life? Yes ☐ No ☐ thoughts, and behaviors?") and "at least several months" for Generalized Anxiety Disorder ("Over the last several months, how frequently have you had the following feelings, thoughts, and behaviors?") Tables 3a and 3b.
The ICD-11 requirements for the frequency of experiencing depressed mood or diminished interest in activities is "…occurring most of the day, nearly every day". The IDQ reflects this by having the first two items suffixed with "…for most of the day" and using response options 3 (Most days) and 4 (Every day) as being indicative of endorsement. The ICD-11 requirements for the frequency of experiencing symptoms of anxiety is "…for at least several months, for more days than not". The IAQ reflects this by having the instructions stating, "Over the last several T A B L E 3a Derivation of the International Anxiety Questionnaire from ICD-11 Description of Generalized Anxiety Disorder (6B00)

ICD-11 Generalized Anxiety Disorder (6B00)
Generalized Anxiety Disorder is characterized by marked symptoms of anxiety that (1) persist for at least several months, (2) for more days than not, manifested by (3) either general apprehension (i.e., "free-floating anxiety") or excessive worry focused on multiple everyday events, most often concerning family, health, finances, and school or work, (4) together with additional symptoms such as muscular tension or motor restlessness, sympathetic autonomic over-activity, subjective experience of nervousness, difficulty maintaining concentration, irritability, or sleep disturbance. (5) The symptoms result in significant distress or significant impairment in personal, family, social, educational, occupational, or other important areas of functioning.
1. Instructions sate "Over the last several months, how frequently…" 2. Items are considered to be "endorsed" if response is "Most days" or "Every day." 3. Apprehension and excessive worry assessed by items 1 and 2.

4.
Items 3 to 8 measure each of these symptoms 5. Functional impairment is assessed by "Have these experiences caused problems in personal, family, social, educational, occupational, or other important areas of your life?" T A B L E 3b Self-report measure of ICD-11 Generalized Anxiety Disorder-The International Anxiety Questionnaire (IAQ) Over the last several months, how frequently have you had the following feelings, thoughts, and behaviors? Please circle the appropriate number to indicate your response. months, how frequently have you…." and using response options 3 (Most days) and 4 (Every day) as being indicative of endorsement.
It is proposed that the IDQ and the IAQ can be scored to capture symptom severity and also to identity probable diagnostic cases (i.e., those that meet ICD-11 diagnostic requirements). The severity scoring method simply involves summing the scores of the nine IDQ items and the eight IAQ items, producing possible ranges of scores from 0 to 36 (IDQ) and 0 to 32 (IAQ), respectively. No cut-off scores are proposed, as "caseness" is defined by applying the ICD-11 diagnostic algorithm for each disorder (described below).
The ICD-11 CDDR for Depressive Episode requires "The concurrent presence of at least five of the … characteristic symptoms occurring most of the day, nearly every day during a period lasting at least 2 weeks. At least one symptom from the Affective cluster must be present". This equates to endorsing (i.e., scoring 3 or 4 on the Likert scale) questions 1 or 2 (or both) from the IDQ, and a total of 5 or more items being endorsed. If these conditions are met, and the functional impairment questions is answered "Yes," then the diagnostic requirements for ICD-11 Depressive Episode have been met.
The ICD-11 CDDR for Generalized Anxiety Disorder requires the "Essential (Required) Features" of either "General apprehensiveness that is not restricted to any particular environmental circumstance (i.e., 'free-floating anxiety')" or "Excessive worry (apprehensive expectation) about negative events occurring in several different aspects of everyday life (e.g., work, finances, health, family)". This equates to endorsing (i.e., scoring 3 or 4 on the Likert scale) questions 1 or 2 on the IAQ. It also states that these essential features should be "…accompanied by additional characteristic symptoms".
For continuity with ICD-10 (WHO, 1993), the IAQ requires a total of 4 or more items to be endorsed with at least one from the essential features. If these conditions are met, and the functional impairment question is answered "Yes," then the diagnostic requirements for ICD-11 Generalized Anxiety Disorder have been met. Possible scores range from 0 to 27 with higher scores reflecting higher symptomatology. The recommended and commonly used cut-off score of ≥10 was used to identify possible "caseness." This cut-off score has been shown to have adequate sensitivity (0.85) and specificity (0.89) for detecting cases of MDD (Kroenke et al., 2001). The psychometric properties of the PHQ-9 scores have been widely supported (Manea et al., 2012), and the internal reliability of the scale scores in this sample was α = 0.94.

DSM-IV Major
DSM-IV Generalized Anxiety Disorder (GAD): The GAD-7 (Spitzer et al., 2006) measures seven symptoms of GAD described in DSM-IV (APA, 1994). It asks participants to indicate how often they have been bothered by the various symptoms over the last 2 weeks on a 4-point Likert scale that ranges from 0 (Not at all) to 3 (Nearly every day). Possible scores range from 0 to 21 where higher scores reflect greater symptomatology, and the recommended cut-off score of ≥10 was used to identify possible "caseness." This cut-off score has been shown to have adequate sensitivity (0.89) and specificity (0.82) for detecting cases of GAD (Spitzer et al., 2006). The psychometric properties of the GAD-7 scores have been widely supported (Hinz et al., 2017), and the internal reliability of the scale scores in this sample was α = 0.96.
Mental health treatment seeking: Participants were provided with the following information: "Mental health difficulties are very common. It will help us understand our survey results if you would tell us whether you currently or have in the past received treatment (medication or talking therapies) for these kind of difficulties". Options were provided, of which the participants were required to choose one of: (1) "I have never received treatment for mental health problems," (2) "I have received treatment for mental health problems in the past," (3) "I am currently receiving treatment for mental health problems," (4) "I am currently receiving treatment for mental health problems but it has been canceled temporarily due to the lockdown," (5) "I am currently on a waiting list to receive treatment for a mental health problem," and (6) "Prefer not to answer." Options 3 and 4 were collapsed into one category, and " Prefer not to answer" responses were treated as missing data, resulting in a 4-category variable: (1) No treatment ever, (2) Treatment in the past, (3) Treatment currently, and (4) Treatment waiting-list.

| Data analysis
First, the distributions of each of the IDQ and IAQ item scor were examined. The percentages of each response category were reported, along with the mean scores and percentage endorsements (score ≥ 3) as summary statistics. Item-total correlations were also calculated and expected to exceed the minimum acceptable value of ≥ 0.30 (Lamping et al., 2002). The summed scores were also calculated, and differences in sex, age, and mental health treatment seeking were tested using t-test and one-way analysis of variance (ANOVA). The IDQ-PHQ-9 and IAQ-GAD-7 correlations were calculated using Pearson product-moment correlations.
Second, both 1-and 2-parameter IRT models were fitted to the data for the IDQ and IAQ separately. Binary item scores (score ≥ 3 on the Likert scale) were used to reflect the fact that the diagnostic algorithm uses item endorsement and not the full scale scores. For the 2-parameter model, discrimination and difficulty parameters were estimated for all items. The discrimination parameter is the probit regression that relates the latent variable, theta (θ), to the binary indicator where higher values indicate increased discriminatory power. Desirable discrimination levels would be "high" (1.35-1.69) or "very high" (>1.70) (Baker, 1985). The difficulty parameter is estimated as thresholds. The 1-parameter model was also tested where the item discrimination parameters were constrained to be equal. If the 1-and 2-parameter models do not differ in fit, then the 1-parameter model was considered the better model on the basis of parsimony. The DIFFTEST function (Asparouhov et al., 2006)  Third, the prevalence estimates for ICD-11 Depressive Episode and ICD-11 Generalized Anxiety Disorder were calculated, and associations with sex, age, and mental health treatment were assessed using Pearson's χ 2 tests.
Proportions of people exceeding the ≥10 cut-off scores on the PHQ-9 and the GAD-7 were compared to the estimates obtained from the IDQ and the IAQ.

| Descriptive statistics
The responses to the IDQ and IAQ items are reported in Table 4.
The total scale scores for the IDQ covered the entire range of possible scores (0-36) with a mean of 7.66 (SD = 9.27). The distribution was positively skewed (S = 1.24, se = 0.05, p < 0.001). The total scale scores for the IAQ also covered the entire range of possible scores (0-32) with a mean of 8.02 (SD = 8.44), and the distribution was also positively skewed (S = 1.09, se = 0.05, p < 0.001).
The mean IDQ score was significantly higher for females (M = 8.03, SD = 9.08) than males (M = 7.19, SD = 9.40:   Table 5. Pairwise comparisons using the Scheffé post hoc tests indicated that all means were significantly different (p < 0.001) except for "Treatment currently" and "Treatment Waiting List" for both IDQ and IAQ.

| IRT and reliability results
The 1 Table 6.
The IDQ discrimination parameter estimates were all statistically significant, and highest for item 5 ("Felt hopeless") and item 2 ("Experienced less interest or pleasure from normal activities for most of the day").
The difficulty parameter estimates ranged from 0.980 for item 9 ("'Experienced reduced energy or fatigue") to 1.511 for item 6 ("Had recurrent thoughts of death or suicide"). Overall, the difficulty estimates indicate that these items are performing well at levels approximately 1 or more standard deviations above the mean of the underlying latent variable of depression. The item characteristic curves and total information curves are shown in Figures 1 and 2.
The IAQ discrimination parameter estimates were all statistically significant, and highest for item 5 ("Felt on edge"), item 3 ("Felt nervous or anxious for several months"), and item 1 ("Felt nervous or anxious for several months"). The difficulty parameter estimates ranged from 0.977 for item 2 ("Worried a lot about different things for several months?") to 1.359 for item 4 ("Felt your heart racing, difficulty breathing, stomach discomfort, or dry mouth?"). Like the IDQ, the difficulty estimates indicate that these items are performing well at levels approximately 1 or more standard deviations above the mean of the underlying latent variable of anxiety. The internal reliability of the IDQ (ω = 0.96) and the IAQ (ω = 0.96) scale scores were both high.
There was no significant sex difference in ICD-11 Depressive Episode (Males = 7.8%, Females = 6.9%: There was a significant association between screening positive for ICD-11 Depressive Episode and mental health treatment seeking (χ 2 [3] = 128.01, p < 0.001). Of those who met diagnostic requirements, a higher percentage were currently receiving treatment (21.8%) or on a waiting list (32.0%) compared to those who had treatment in the past (9.5%) or had never received treatment (3.7%). Similarly, there was a significant association between screening positive for ICD-11 Generalized Anxiety Disorder and mental health treatment seeking (χ 2 [3] = 190.80, p < 0.001). A higher percentage of those that met diagnostic requirements were currently receiving treatment (25.3%) or were on a waiting list (38.0%) compared to those who were treated in the past (8.5%) or had never received treatment (3.1%).
F I G U R E 1 Item characteristic curves and total information curves for IDQ. IDQ, International Depression Questionnaire.
The goal of this study was to develop brief, easy-to-use, and freely available self-report measures of ICD-11 Depressive Episode and ICD-11 Generalized Anxiety Disorder, and test their psychometric properties in a nationally representative general population sample. Our results provide preliminary support for the reliability and the validity of the IDQ and the IAQ scores, and indicate that these measures can be used to identify adults likely to be suffering from these disorders.
In a general adult population sample it would be expected that the scale items should be able to generate scores along the depression and anxiety continua, representing the absence of symptoms through to the highest F I G U R E 2 Item characteristic curves and total information curves for IAQ. IAQ, International Anxiety Questionnaire.
levels of severity with all symptoms being endorsed. The distribution of scores on the IDQ and IAQ provided evidence to support this assumption as all response categories were used by the participants and, as expected, the scores were positively skewed. Furthermore, the scores were also homogeneous as indicated by the high levels of item-total correlations. The homogeneous nature of the items was confirmed with the high estimates of internal reliability for the IDQ and IAQ scores. These are desirable characteristics of item-level scores, and a necessary prerequisite for further psychometric analyses (Clark & Watson, 1995;Lamping et al., 2002).
Construct validity was supported by means of a 2-parameter IRT model based on the scores from the IDQ and IAQ. The fit statistics supported the hypothesis of uni-dimensionality, and the discrimination parameters indicated that the items on both scales performed well. For the IDQ, the two items with the highest discrimination were those measuring anhedonia (item 2), one of the two core affective symptoms required for diagnosis, and hopelessness (item 5). Previous research has identified hopelessness as an important factor in distinguishing depressed from nondepressed participants (McGlinchey et al., 2006). Indeed, one of the main differences between ICD-11 and DSM-5 is that in the latter hopelessness is a descriptors of depressed mood rather than separate symptom. With regard to the IAQ, the discrimination for items 1 (Felt nervous or anxious), 2 (Worried a lot about different things), 3 (Felt physically tense or agitated) and 5 (Felt 'on edge') were all high, indicating that the two core symptoms (items 1 & 2) necessary for diagnosis are operating well.
The rates of ICD-11 Depressive Episode and ICD-11 Generalized Anxiety Disorder derived from the IDQ and IAQ were 7.4% and 7.1%, respectively. These are slightly higher than the most recent UK population prevalence estimates from the Adult Psychiatric Morbidity Survey 2014 (McManus et al., 2016) which found that the prevalence of past-week generalized anxiety disorder was 5.9% and past-week depression was 3.3%. These figures were based on a structured clinical interview, the Clinical Interview Schedule (CIS-R: Levis et al., 2020;Lewis et al., 1992) using the ICD-10 diagnostic requirements. Structured clinical interviews tend to produce lower prevalence estimates than self-reports (Thombs et al., 2018), and the past-week timeframe is also likely to account for the slightly lower prevalence estimates. Taking these factors into account, the estimates based on the IDQ and IAQ appear to be reasonably similar to those from an ICD-10 derived structured clinical interview. The slightly higher rates produced by the IDQ/IAQ relative to the CIS-R may be partly attributable to the psychological consequences of COVID-19 as there is evidence that rates of depression and anxiety increased somewhat after the pandemic (Patel et al., 2022), however, this effect has not been uniform across the entire population (Shevlin et al., 2022) and thus, such an interpretation should be made cautiously.
In contrast to the conservative estimated prevalence rates of ICD-11 depressive disorder and generalized anxiety identified by the IDQ and IAQ, respectively, one-quarter of the sample exceeded the PHQ-9 threshold for depression (25.5%), and one-fifth exceeded the GAD-7 threshold for anxiety (20.7%). The largest study to date that has evaluated the diagnostic accuracy of the PHQ-9 (Levis et al., 2019) concluded that the recommended and commonly used cut-off score of 10 maximized combined sensitivity and specificity but also produced high levels of false positives (approximately 50% in a primary care setting). We propose that the close adherence to the ICD-11 symptoms and application of the diagnostic algorithm, rather than a cut-off score, would reduce the negative predicted value of the IDQ and the IAQ; although this is for future research to test. Encouragingly, the vast majority of individuals (>85%) who screened positive for depression and anxiety on the PHQ-9 and GAD-7 respectively, met diagnostic requirements for ICD-11 Depressive Episode and ICD-11 Generalized Anxiety Disorder on the IDQ and IAQ.
There was a significant association between meeting the requirements for ICD-11 Depressive Episode and ICD-11 Generalized Anxiety Disorder, and this was expected; the positive association between depression and anxiety at both the diagnostic and symptom level has been frequently documented (e.g., Jacobson & Newman, 2017;Möller et al., 2016). Indeed, this overlap has been widely acknowledged clinically (Kalin, 2020) and studies based on large general population samples have found that co-occurring clinically relevant anxiety and depression was more common than either anxiety or depression alone ( | 867 disorder and generalized anxiety (4.6%) than depressive disorder alone (2.7%) and generalized anxiety alone (2.3%).
There is a provision in the ICD-11 to be able to identify "Prominent anxiety symptoms in Mood Episodes" (6A80.0) and "Mixed depressive and anxiety disorder" (6A73) which can accommodate the co-occurrence of symptoms from both disorders. The IDQ and IAQ provide the opportunity to assess these symptoms and disorders.
There are some limitations of this study. First, as this study provides only initial evidence of validity of these newly developed measures, future research is needed to establish the degree of agreement between clinical interview assessment and IDQ/IAQ scores. Second, while the study did not used data from a probability-based sample, the sample characteristics of age, sex, and household income were representative of the UK adult population. Third, future research on the performance of these scales is required in clinical setting with participants displaying clinically significant levels of mood and anxiety distress. Given the international focus of the ICD-11, cross-cultural analyses are also required.
In conclusion, the IDQ and the IAQ are brief self-report measures directly derived from the ICD-11 diagnostic descriptions of Depressive Episode and Generalized Anxiety Disorder and freely available to all interested parties.
Initial findings from analyses based on data from a large nationally representative sample of the UK adult population are encouraging: They indicate that these scales (1) produced adequate variability in scale scores, (2) have high levels of internal consistency, (3) have high/very high levels of discrimination, (4) tap information at the upper end of the underlying distributions, and (5) appear to be related to mental health help-seeking.

ACKNOWLEDGMENT
The initial stages of this project were supported by start-up funds from the University of Sheffield (Department of Psychology, the Sheffield Methods Institute and the Higher Education Innovation Fund via an Impact Acceleration grant administered by the university) and by the Faculty of Life and Health Sciences at Ulster University. The

CONFLICT OF INTEREST
The authors declare no conflict of interest.

DATA AVAILABILITY STATEMENT
Data are publicly available at https://osf.io/qv47z/

TRANSPARENT PEER REVIEW
The peer review history for this article is available at https://publons.com/publon/10.1002/jclp.23446

Mark Shevlin
http://orcid.org/0000-0001-6262-5223 ENDNOTES 1 The IDQ/IAQ and the PHQ-9/GAD-7 were not in consecutive parts of the survey and the order in which they were presented was randomized to remove any order effects.
2 The models were also estimated using the data from the original 5-category responses.