The evaluation of the General Health Questionnaire (GHQ-12) reliability generalization: A meta-analysis

Background The General Health Questionnaire (GHQ-12) is widely used for detecting psychiatric disorders, but its reliability across different populations remains to be determined. Objective This meta-analysis aims to evaluate the reliability of GHQ-12 across varied cultural and demographic settings. Method This meta-analysis evaluates the reliability of General Health Questionnaire [GHQ-12]’ across diverse populations, employing a systematic search strategy and rigorous inclusion criteria. This meta-analysis evaluates the General Health Questionnaire (GHQ-12) using a pre-registered protocol (CRD42023488436) to ensure unbiased results. Data from 20 studies published between 2016–2023 were analysed using a random-effects model, with quality assessment guided by COSMIN Risk of Bias and QUADAS-2. This study enhances our understanding of GHQ-12’s psychometric properties. Results For the GHQ-12 subscales, Cronbach’s alpha coefficients were 0.72 (90% CI [0.68, 0.75]) for anxiety and depression, 0.82 (90% CI [0.79, 0.86]) for social dysfunction, and 0.72 (90% CI [0.68, 0.76]) for loss of confidence. However, the analysis showed substantial heterogeneity (I2 = 90.04%), with significant variability in reliability estimates across different studies. The overall Cronbach’s alpha was 0.84 (95% Cl [0.810, 0.873]) with SE = 0.016 (90% CI [0.68, 0.82], p < .05), indicating moderate to high internal consistency. Quantifying heterogeneity revealed a substantial level (se = 0.0016, I2 = 96.7%), signifying considerable variability in the reliability estimate among the studies. Results further show Cronbach’s alpha coefficients range from 0.82 to 0.85 (95% Cl [0.77, 0.86 to 0.81, 0.90]) for the GHQ 12 items. Conclusion While reaffirming the GHQ-12’s utility in mental health assessment, our findings urge a more cautious and context-aware application of the questionnaire. The substantial heterogeneity and variability in reliability scores indicate a need for further research. Future studies should explore the reasons behind this variability, focusing on cultural, socio-economic, and methodological factors that might influence the GHQ-12’s reliability. This critical analysis underscores the need for a deeper understanding of the GHQ-12’s applicability and the importance of tailoring mental health assessment tools to specific population characteristics.


Introduction
Mental health assessment an indispensable pillar in the comprehensive understanding and effective management of psychological well-being.Self-report mental health assessments are commonly used in research contexts, but they have limitations due to the subjective nature of self-reporting and the lack of temporal precision [1,2].Amid the myriad assessment tools, the General Health Questionnaire (GHQ-12) has emerged as a versatile instrument designed to appraise mental health status across various populations, cultures, and linguistic contexts [3,4].The GHQ-12, rooted in Goldberg and Hillier's theoretical framework, assesses an individual's mental health status through questions encompassing psychological well-being and distress [4].
The GHQ-12 has transcended geographical boundaries, demonstrating its utility across diverse populations, cultures, and languages [5,6].Notably, brevity and adaptability have contributed to its widespread application in clinical and research settings, facilitating a quick yet comprehensive screening of psychological distress and well-being [7,8].
The General Health Questionnaire (GHQ-12) has emerged as a widely utilized instrument for gauging mental health across diverse populations.Studies have shown that individuals with clinically diagnosed emphysema experience poorer general mental health, increased levels of social dysfunction and anhedonia, heightened depression and anxiety, as well as an elevated loss of confidence [9].
The GHQ-12 has been found to have construct and criterion validity, reliability, and gender and age differences among hospitalized patients with COVID-19 [5,7].Researchers have used item response theory (IRT) to analyze GHQ-12 responses and have identified subgroups of individuals based on their level of psychological distress [7].The GHQ-12 has also been used to screen women in the reproductive age group for their mental health status, with results showing that psychological stressors are present in all three groups of women [10].Additionally, the GHQ-12 has been standardized and used to compare different population cohorts, providing valuable data for public health and clinical practice [11].
The reliability of the General Health Questionnaire-12 (GHQ-12) has been evaluated in various studies.A study that evaluated the reliability and validity of the GHQ-12 in Chinese dental healthcare workers found that the two-factor model of GHQ-12 had good reliability and validity, making it suitable for assessing the mental health status of Chinese dental healthcare workers [12].The Research investigated the factor structure of the GHQ-12 for South Korean university students and found that Graetz's three-factor model provided the best fit to the data, indicating that the Korean version of the GHQ-12 is a robust measure of general psychological distress symptoms [13].A study tested the bifactor structure of the GHQ-12 among Brazilian physicians and found that it had good psychometric properties, although caution is needed when interpreting specific factors [14].
A robust literature review underscores the GHQ-12's extensive usage in various populations worldwide.In a study of South African healthcare workers, the GHQ-12 showed adequate reliability and validity, with a four-factor model suggesting multidimensionality in this group [15].Another study in Spanish adolescents found that the GHQ-12 is a one-dimensional test for screening psychological distress with excellent psychometric properties [16].Among male tannery workers in India, the GHQ-12 was found to be reliable and valid, with a threefactor structure [17].In German and Pakistani university teachers, the GHQ-12 demonstrated good psychometric properties, with high reliability and validity [6,18].A Brazilian-Portuguese version of the GHQ-12 showed good internal reliability and factorial validity in both clinical and non-clinical groups [19] Overall, these studies suggest that the GHQ-12 is a reliable and valid tool for measuring mental health in various populations.
The GHQ-12's cross-cultural adaptability has been examined, considering the potential influence of cultural variations on the interpretation of mental health indicators.A study in a Finnish population confirms GHQ-12's reliability for measuring psychological health but suggests limited predictive ability for mental health service use [20].Similarly, research critically evaluates the GHQ-12 validity for detecting common mental disorders among men in Goa, India, with findings demonstrating acceptable criterion validity but suggests a need for ongoing validation in the local context [21].A study assessed the measurement equivalence of the GHQ-12 across six ethnic groups in the UK and found the instrument is invariant, allowing valid population comparisons, but caution is urged for individual screenings [22].
In the Spanish version of the GHQ-12, a multidimensional three-factor structure has been identified as the best-fitting model [23].A study found compelling evidence of robust internal consistency and a stable two-factor structure in the GHQ-12 reliability study among Australian students aged 7-19, affirming its suitability for mental well-being assessment [24].Research on Korean early childhood teachers establish the GHQ-12 as a dependable tool with a three-factor structure, affirming its reliability and validity for effectively assessing psychiatric symptoms [25].
On contrary, research exploring the factorial validity of the GHQ-12 in outpatients with psychological disorders in China found that the 3-factor model demonstrated superior fit, and the GHQ-12 exhibited measurement invariance across genders, affirming its reliability and applicability in clinical mental health assessments [26].Similarly, research investigated the factor structures of the GHQ-12 among rural Chinese residents and found adequate reveal reliability of the instrument's two-and three-factor structures, suggesting its applicability for assessing mental health in rural China based on the number of kins [27].The study emphasises the importance of cultural and contextual considerations in utilising the GHQ-12 in diverse populations.Study validated the Persian version of the GHQ-12 for Iranian elders found the instrument two-factors, demonstrating adequate validity and reliability, making it a suitable tool for assessing the general health of Persian-speaking elderly populations [28].
The study assessed the GHQ-12 reliability for Iranian elders, and found good internal consistency and satisfactory test-retest reliability, supporting its application for a mental health assessment [29].The study validates the GHQ-12 for assessing psychological distress in chronic low back pain patients' findings demonstrate good reliability, construct validity, and responsiveness, supporting its utility for assessing this population [30].A study confirms the General Health Questionnaire (GHQ-12) reliability among older adults in India also found high internal consistency and a valid two-factor structure, supporting its applicability [31].A study evaluated the General Health Questionnaire (GHQ-12) for use with autistic adults without learning difficulties, and found good psychometric properties, supporting the measure's validity in this population [32].A study by [33] revealed that the Indonesian adaptation of GHQ-12 demonstrates favourable internal consistency and construct validity.This suggests its appropriateness for mental health screening in primary care patients, acknowledging potential trade-offs between sensitivity and specificity.The GHQ-12 was also, found to have good psychometric properties in autistic adults without learning difficulties [32].
The rationale for employing a meta-analysis lies in its capacity to synthesize findings from numerous studies, allowing for a comprehensive examination of the GHQ-12's reliability.This approach is crucial for capturing the inherent variability in reliability estimates across different populations, periods, and cultural contexts [34].While previous studies have assessed the psychometric properties of the GHQ-12 [28,32,33], a systematic evaluation of its reliability across a many of studies is notably absent.This study seeks to fill this void by conducting a reliability generalization meta-analysis, aiming to identify patterns, trends, and potential moderators that may influence the GHQ-12's reliability.
Findings from this meta-analysis hold substantial implications for mental health practitioners, researchers, and policymakers.A nuanced understanding of the GHQ-12's reliability can guide its appropriate use in diverse settings, ultimately enhancing the accuracy of mental health assessments and the subsequent interventions based on those assessments.In undertaking this meta-analysis, we aspire to contribute valuable insights that refine the understanding of the GHQ-12's reliability and advance the broader field of mental health assessment, aligning with the ongoing efforts to provide efficient and accurate tools for evaluating psychological well-being.

Study design
We conducted a reliability generalization meta-analysis (RG) to comprehensively evaluate the psychometric properties of the General Health Questionnaire (GHQ-12).The study protocol, outlining the specific methods for the reliability generalization meta-analysis, was pre-registered in the Prospero database with the registration number CRD42023488436.This pre-registration was implemented to enhance transparency, prevent selective reporting, and minimize the risk of bias in the study design and analysis.

Search strategy
This systematic review implemented a predetermined search strategy to comprehensively evaluate the reliability of the General Health Questionnaire (GHQ-12).A thorough search was conducted across multiple databases, including PubMed, PsycINFO, Medline, CHAHL, Science Direct, Scopus, Web of Science, Google Scholar, APA Psycharticle, and EBSCO, targeting studies published between 2016 and 2023 that focused on the GHQ-12's reliability across diverse populations.

Inclusion and exclusion criteria
This systematic review encompasses empirical studies that provide quantitative reliability data for the General Health Questionnaire (GHQ-12).The meta-analysis incorporated data from 20 independent studies, selected for their pertinence to the evaluation of psychometric properties of the General Health Questionnaire (GHQ-12).
Inclusion criteria cover cross-sectional and validation studies, along with psychometric evaluations detailing GHQ-12 reliability, specifically focusing on studies reporting reliability metrics such as Cronbach's alpha and test-retest reliability.The inclusion criteria also encompass studies involving participants from diverse age groups (including adolescents, adults, and the elderly) to capture the GHQ-12's applicability across the lifespan.The research spans various cultural and geographic settings, assessing the scale's reliability across different socio-cultural contexts.It includes both clinical populations (individuals diagnosed with mental health disorders) and non-clinical populations (e.g., general community samples, students) to understand the scale's reliability in different mental health states.
The review includes participants of all gender identities and sexual orientations to ensure a comprehensive examination.It considers studies utilising both the original English version of GHQ-12 and validated translations, examining reliability across different linguistic contexts.
Exclusion criteria involve qualitative research, case reports, opinion pieces, theoretical papers without empirical data, systematic reviews, meta-analyses, and non-empirical studies.Non-peer-reviewed materials such as grey literature, dissertations, theses, and conference abstracts without peer review are excluded to maintain scientific rigour.
Additionally, research published in languages other than English is excluded unless a validated translation of GHQ-12 is used, ensuring consistency in the measurement tool's application.By focusing on diverse quantitative designs, this review comprehensively assesses GHQ-12 reliability across different populations and settings, enhancing our understanding of its psychometric robustness and application in measuring psychological well-being.This approach ensures a broad yet precise examination of GHQ-12's empirical utility and generalizability.

Publication bias and quality assessment
Publication bias was assessed using funnel plots and statistical tests to ensure that the metaanalysis included studies with diverse results, mitigating the risk of bias.In the assessment of study quality, the review method strictly adhered to the reliability of the COnsensus-based Standards for the selection of health Measurement Instruments Risk of Bias checklist (COS-MIN RB, [35]).This checklist served as a robust framework, ensuring a thorough evaluation of the methodological quality and potential risk of bias within the studies included in the review.
To enhance the credibility of the assessment, a comprehensive evaluation of study quality was conducted by two independent investigators.This evaluation employed both the Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2, [36] )).This dual and rigorous examination was undertaken to provide an in-depth analysis of the methodological robustness and potential biases inherent in the studies included in the review.By employing these recognized and validated assessment tools, the quality assessment process aimed to ensure the reliability, transparency, and validity of the findings, ultimately contributing to the credibility of the overall review (see S1 Fig and S1 Table).

Data extraction and study selection
Systematic extraction of pertinent data from the selected studies involved capturing study characteristics, demographics, reliability coefficients, validity measures, and other essential information related to GHQ-12 psychometric properties.The meta-analysis applied eligibility criteria to select relevant research, with two independent reviewers meticulously screening titles and abstracts.Full-text articles underwent assessment for final inclusion, and to ensure consistency and accuracy, a comprehensive documentation process was implemented.In disagreements, the reviewers reached a consensus, ensuring a robust and reliable compilation of data for further analysis.

Statistical analysis
A meta-analytic approach was employed to synthesise reliability coefficients across individual studies, utilising random-effects models to address potential heterogeneity.In this meta-analysis, effect sizes were calculated based on Cronbach's alpha, which measures the internal consistency or reliability of a scale.
The formula for effect size is ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi Cronbach 0 s alpha A qualitative analysis was summarised and interpreted the reliability of the General Health Questionnaire (GHQ-12).Additionally, a meta-analysis will be conducted, employing a random-effects model to pool reliability coefficients.Heterogeneity was assessed using the I 2 statistic and subgroup analyses.Data analysis was performed using the R studio meta forpackage.

Ethical considerations
While formal ethical clearance was not obtained for our study on the reliability generalization of the General Health Questionnaire (GHQ-12) through meta-analysis, our commitment to upholding ethical standards remains unwavering.We prioritize participant confidentiality, secure informed consent, and maintain research integrity.Our dedication to transparency and methodological rigour is evident through PROSPERO protocol registration and adherence to PRISMA guidelines.These practices ensure a thorough and responsible approach, instilling confidence in the scientific community regarding the validity and reliability of our findings.During the second screening phase, a total of 13,374 studies were assessed against the inclusion and exclusion criteria, leading to the exclusion of 13,537 articles.This left a pool of 131 studies for further screening.Out of these, only 48 studies were deemed eligible based on the study inclusion criteria.Finally, a subset of 20 studies from the 48 full articles, which met the study inclusion criteria, was included in the meta-analysis.

Selection and reliability coefficients induction
In examining the psychometric properties of the General Health Questionnaire-12 (GHQ-12) across various studies, diverse patterns emerge, shedding light on the questionnaire's applicability in different contexts.[35] conducted a study in Spain involving 47 participants, female nurses, revealing a robust Cronbach's alpha of 0.872 and a moderately stable test-retest reliability of 0.594.[36] explored GHQ-12 in an Indian undergraduate population [N = 432], finding a Cronbach's alpha of 0.784, though test-retest reliability was not specified.Lee and Kim's investigation [13] in South Korea [N = 504] unveiled a Cronbach's alpha of 0.810 in undergraduate students.Centofanti [24] conducted an extensive study with 18,070 participants in Australia, reporting a Cronbach's alpha of 0.700.
The GHQ-12 exhibited high internal consistency in diverse settings.Elovanio et al.'s [20] study in Finland [N = 4270] with a general population demonstrated a robust Cronbach's alpha of 0.920.Endsley et al. [2017] in India [N = 773] found a Cronbach's alpha of 0.820 among the general population, though conducted in Konkani.The Chinese context was explored by Liu et al. [26] and Guan & Han [27], with N = 870 and N = 32,083, respectively, reporting satisfactory Cronbach's alphas of 0.830 and 0.750.Zhong et al. [2022] investigated dental healthcare workers in China (N = 3,020), revealing a high Cronbach's alpha of 0.892 and a notable test-retest reliability of 0. 843.The synthesis underscores the GHQ-12's reliability across diverse populations and settings, encouraging its use as a valid instrument for mental health assessment.However, careful consideration of cultural and demographic nuances is crucial for nuanced interpretation.

GHQ-12 subscale reliability
This study investigates the reliability generalization of the General Health Questionnaire (GHQ-12) subscales, specifically focusing on Anxiety and Depression, Social Dysfunction, and Loss of Confidence.A meta-analysis was conducted to estimate Cronbach's alpha for each subscale, and the results, detailed in Table 1, unveiled varying reliability estimates for individual subscales.For Anxiety and Depression, the meta-analysis yielded a Cronbach's alpha coefficient of 0.72 (90% CI [0.68, 0.75]).Similarly, for Social Dysfunction, the estimated Cronbach's alpha was 0.82 (90% CI [0.79, 0.86]), and for Loss of Confidence, it was 0.72 (90% CI [0.68, 0.76]).The overall estimate of Cronbach's Alpha was 0.75 (SE = 0.04, 95% CI [0.68, 0.82]), indicating a moderate to high level of internal consistency across the GHQ-12 scales.The random-effects meta-analysis revealed substantial heterogeneity among the studies (I 2 = 90.04%).The Q test for heterogeneity was significant (Q [df = 2] = 20.04,p < .05),indicating notable variability in Cronbach's alpha estimates across the subscales.

GHQ-12 items reliability assessment
This analysis aimed to evaluate the reliability of individual items within the General Health Questionnaire (GHQ-12).The specific focus was on the internal consistency, measured by Cronbach's alpha, for each item.Table 1 presents the meta-analysis results for The random-effects meta-analysis revealed substantial heterogeneity among the studies (I 2 = 0.00%).The test for heterogeneity is non-significant (Q = 1.689, p> .05),indicating homogeneity in reliability estimates across the 12 items of GHQ.

GHQ-12 test-retest assessment
The meta-analysis aimed to assess the General Health Questionnaire (GHQ-12) test-retest reliability across a diverse set of studies (k = 4).According to the random-effects model, a significant overall estimate of 0.78 was observed (95% CI: [0.6567, 0.9045]), indicating a moderate to high level of consistency in GHQ-12 test-retest reliability (see Table 1).Regarding heterogeneity, the analysis revealed substantial variability among the studies, with an I 2 of 87.3% and a τ 2 of 0.0134.The Q-test for heterogeneity was highly significant (Q = 23.63,df = 3, p < 0.05), indicating significant differences in test-retest reliability estimates across studies.The forest plot (Fig 5) visually depicted individual study estimates and their contributions.

Discussion
The assessment of the General Health Questionnaire (GHQ-12)'s overall reliability revealed commendable results, backed by a robust coefficient alpha estimate.This aligns with the consensus in existing literature [13,35], but a critical examination of the methodologies employed in these studies prompts a closer scrutiny of the reliability measures employed.While our findings support GHQ-12's reputation as a reliable mental health assessment tool across diverse populations, it is crucial to question the generalizability of these results and consider potential biases introduced by various study designs.
The second objective focused on investigating the reliability generalization of GHQ-12 subscales.Notably, the Anxiety and Depression subscale exhibited moderate reliability, while the Social Dysfunction and Loss of Confidence subscales demonstrated higher reliability.These results are consistent with previous studies, highlighting variations in reliability across different dimensions of psychological distress [14,16].
Examining the reliability generalization of GHQ-12 subscales prompts a closer look at the variations observed within the Anxiety and Depression, Social Dysfunction, and Loss of Confidence dimensions.Specifically, the Anxiety and Depression subscale showed moderate reliability, while the Social Dysfunction and Loss of Confidence subscales displayed higher reliability.These findings align with the conclusions drawn by earlier studies, underlining the consistency of reliability variations across diverse dimensions of psychological distress [14,16].To gain a more detailed understanding of the instrument's reliability, conducting a more rigorous and critical analysis of the psychometric properties associated with each subscale is imperative.
The evaluation of individual items within GHQ-12, showcasing consistent high reliability, prompts a critical examination of the underlying constructs measured by each item.While the study echoes [32]' emphasis on robust internal consistency, a deeper exploration into the construct validity of each item is crucial.Critical discussions surrounding the theoretical underpinnings of these items and their alignment with contemporary conceptualizations of mental health will contribute to a more detailed interpretation of the results.
Turning attention to the reliability variations across distinct language versions, our study has revealed noteworthy differences in 'Cronbach's alpha.A critical evaluation of the impact of cultural and linguistic nuances on mental health assessments is imperative.While supporting the findings of [30] and the suitability of the Indonesian version [33], it is crucial to address potential cultural biases and language-related challenges that may affect the cross-cultural applicability of GHQ-12.
Finally, the examination of test-retest reliability has been broadened to underscore the importance of the results obtained from four distinct studies.A critical examination of the potential sources of variability in test-retest reliability is essential, including factors such as the time interval between assessments and the stability of mental health conditions over time.While our results align with [29]'findings among Iranian elders, a more critical analysis will provide a comprehensive understanding of the temporal stability of GHQ-12 scores.

Implications
This research evaluates the reliability of the General Health Questionnaire (GHQ-12) across diverse populations, highlighting the need for culturally sensitive adaptations.It also emphasises the need for ongoing refinement of mental health assessment tools, guiding future modifications and improvements.The study contributes valuable knowledge to mental health, supporting healthcare professionals and researchers in understanding and addressing mental health issues.It advocates for a combination of qualitative and quantitative approaches to comprehensively understand mental health conditions and their assessment.

Future directions
Future research on the GHQ-12 should focus on longitudinal studies to better understand its temporal stability and delve into reliability across diverse demographic and cultural groups, particularly underrepresented populations.Emphasis on cross-cultural validation and cultural adaptations of the GHQ-12 will improve its applicability.Detailed examination of subscale reliability and factors influencing variability is crucial, alongside comparisons with standard diagnostic measures.Combining qualitative and quantitative methods will enrich the understanding of mental health assessments.Developing culturally tailored instruments will address cross-cultural variations, enhancing the GHQ-12's effectiveness in diverse settings, thereby refining it as a reliable mental health assessment tool.

Conclusion
According to our meta-analysis, the General Health Questionnaire (GHQ-12) has good overall reliability, according to our meta-analysis, indicating that it can be used with confidence to assess mental health in a range of populations.Despite variations in subscale dependability, individual items consistently exhibit high reliability.However, the different effects of culture, language, and study heterogeneity emphasise the need for caution when interpreting data.It is a helpful tool that will only improve over time, increasing our understanding of mental health and being more applicable in various contexts.
(S1 Fig)) and the COnsensusbased Standards for the selection of health Measurement Instruments Risk of Bias checklist (COSMIN RB, [35] (S1 Table

Fig 1
Fig 1 illustrates the PRISMA 2020 flowchart outlining the study selection process and screening.The search across various databases, including Medline, APA PsycINFO, CINAHL, APA PSYCHArticle, EBSCO, PubMed, Web of Science, Scopus, ScienceDirect, Google Scholar, and others, yielded a total of 13,572 research articles.The titles and abstracts of these articles were initially screened, excluding 198 duplicated studies.

Fig 1 .
Fig 1. PRISMA flow chart of study selection for reliability generalization meta-analysis of General Health Questionnaire-12 (GHQ-12).https://doi.org/10.1371/journal.pone.0304182.g001 Fig 2 provides a visual representation of the meta-analysis results.Also, Fig 3 provides a visual representation of the included studies bias.
Fig 4 provides a visual representation of the meta-analysis results.