Quality of mental health questionnaires in conflict-affected adult populations in low and middle income countries: A systematic review

Highlights • Quality is variable for conflict-affected populations’ mental health questionnaires• We found moderate evidence for reliability and validity but none for responsiveness• Equity in authorship and populations covered must be improved• Research capacity in conflict-affected settings needs strengthening• We recommend stronger use of conceptual frameworks and reporting standards


Introduction
An estimated 172 million people are affected by armed conflict worldwide, including over 59 million people forcefully displaced from their homes either within their countries as internally displaced persons (IDPs) or into new countries as refugees. ( Centre for Research on the Epidemiology of Disasters, 2013 ) Conflict is associated with increases in both physical and mental health needs coupled with the breakdown of health systems. Spiegel et al., 2010 ;Roberts and Browne, 2011 ) Mental health disorders are more prevalent among populations exposed to conflict; a systematic review and meta-analysis on prevalence estimates of mental disorders in conflict-affected settings found that the estimated total prevalence of depression, anxiety, post-traumatic stress disorder, bipolar disorder, and schizophrenia was 22·1% (95% UI 18·8-25·7). ( Charlson et al., 2019 ) Poor mental health among conflict-affected populations is related to exposure to violent and traumatic events, forced migration, increased daily stressors related populations if they have been validated appropriately. Expert consensus has prioritised the need to strengthen the evidence base for appropriate methods to assess the mental health and psychosocial needs of populations in humanitarian settings to improve mental health and psychosocial support in humanitarian settings. ( Tol et al., 2011 ) Collecting health data on conflict-affected populations is challenging for reasons such as security risk posed to researchers and participants in collecting data, highly mobile populations necessitating rapid data collection methods and impeding follow-up, limited resources and capacity, and ethical concerns. ( Siriwardhana et al., 2013 ;Blanchet et al., 2017 ;Checchi et al., 2017 ) These factors can make it difficult to collect data on mental health and hinder the development of mental health questionnaires specific to these contexts. Consequently, although the vast majority of conflict-affected populations reside in low and middle income countries (LAMICs), ( Internal Displacement Monitoring Centre, 2015 ;United Nations High Commissioner for Refugees, 2014 ) questionnaires to measure mental health are mostly developed in English-speaking high-income countries and based on the understanding of mental health that is prevalent in these countries.
Meta-analyzes of the prevalence of PTSD and depression in conflictaffected populations have found that a large proportion of the variation in results between studies arose due to methodological factors such as the choice of questionnaires. ( Charlson et al., 2019 ;Steel et al., 2009 ;Fazel et al., 2005 ) Evidence in LAMICs (albeit not with conflict-affected populations) suggests that questionnaires are often not appropriately validated before their use. ( Tsai et al., 2013 ;Tsai, 2014 ) A systematic review from 2002 on health status questionnaires used with refugees identified 183 papers and found that measurements were mainly derived from, "instruments that have limited or untested validity and reliability in refugees. " ( Hollifield et al., 2002 ) However, this review was for refugees only and dominated by studies in high-income countries. There has also been a very large increase in the number of mental health papers published with conflict-affected populations since 2002. ( Blanchet et al., 2017 ) To date, there have not been any systematic reviews published on the suitability and appropriateness of mental health questionnaires that are developed or evaluated for conflict-affected populations in LAMICs. The aim of this systematic review is to assess the quality of questionnaires for mental disorders that have either been developed or validated in conflict-affected settings in LAMICS.

Search strategy and selection criteria
The systematic review method followed PRISMA guidelines ( Moher et al., 2009 ).
The databases searched were CINAHL Plus, EMBASE, Global Health, MEDLINE and PsycINFO. The initial search was carried out on 12 th August 2016 and then updated on 16 th October 2019. The search included all the articles published from the inception of each database to the last search date.
Search terms were developed for three concepts: measurement properties, mental health and armed conflict. The search was conducted using search filters coupled with a comprehensive set of free search terms and index terms from the Consensus-based Standards for the Selection of Health Measurement Instruments (COSMIN) guidelines. ( Terwee et al., 2009 ) The full search terms are given in the online supplementary materials ( Appendix A ). The reference lists of the studies included in the review were also manually searched.

Inclusion criteria
The population of interest was civilian adults (aged 18 + years) in LAMICs either forcibly displaced by conflict within their own country (IDPs) or outside of their own country (refugees) following standard definitions ( Roberts and Browne, 2011 ;Deng, 1998 ;United Nations, 1951 ) and people currently living in a conflict-affected area or one affected by conflict within the last 5 years (including returned IDPs and refugees). Armed conflict was defined as "a contested incompatibility which concerns government and/or territory where the use of armed force between two parties, of which at least one is the government of a state, results in at least 25 combatant battle-related deaths per year. " ( Uppsala University, 2015 ) The primary aim of included studies had to be to develop a mental health questionnaire or evaluate the measurement properties of a pre-existing questionnaire in a conflict setting. A questionnaire was considered a unique questionnaire if it had been newly developed for a conflict-affected population or if it had been adapted for a new conflict-affected population.
Articles were included if they reported at least one measurement property of a self-reported questionnaire measuring a specific mental health disorder as defined in an edition of the International Classification of Disease (ICD) or the Diagnostic and Statistical Manual (DSM) or a generic questionnaire with a specifically-identified cut-off point for a diagnosable disorder.
Only studies published in a peer-reviewed journal in English or French were included.

Exclusion criteria
Studies including study participants primarily displaced due to reasons other than conflict (e.g. natural disasters) and war combatants and military veterans were excluded.
Studies that included results from validating a questionnaire but did not have validation as a primary aim were excluded as many of these studies did not present adequate information about the validation methods for quality appraisal.
Studies on questionnaires measuring general psychological health and mental distress were excluded to focus on how suitable existing questionnaires are for detecting mental health disorders recognised in international classifications. Results from studies describing assessments that were based only on clinical-rating scales, interviews, group discussions, performance-based tests, diaries, videos, telephone calls, laboratory tests, or imaging were also excluded.

Data extraction
Retrieved articles were transferred to Mendeley Version 1.19.4. Duplicates were removed and titles and abstracts were screened. For those studies appearing to meet the inclusion criteria, the full text was retrieved for confirmation. For queries about whether papers met the inclusion criteria that could not be resolved on review of the full text, the authors were contacted for clarification.
For included articles, data about the measurement properties of each questionnaire were extracted using a standard data extraction form and compiled into tables. For the questionnaires that had originally been developed in different settings, the adapted questionnaires, the original development papers were then searched for. The data from these original development papers were compiled into a separate table for comparison with the results from the new conflict-affected settings. The search strategy, study selection and data extraction were carried out by one of the authors (SC) with any queries discussed with two of the other authors (BR and SS). The ability of a scale to detect clinically important change over time; assessed by comparing scores before and after an intervention of known efficacy (on the basis of various methods including t-tests, effect sizes, standardised response means, or responsiveness statistics)

Critical appraisal
Significant differences between known groups or difference of expected magnitude.
Grading system for acceptability: 0 = no evidence in favour, + = limited evidence in favour, ++ = moderate evidence in favour, +++ = strong evidence in favour  1 ). These quality appraisal criteria were applied to all the questionnaires identified through the search. Quality appraisal criteria were applied to the data collected from the study population under investigation for each unique questionnaire. For the adapted questionnaires, the quality appraisal criteria were also applied to their parent questionnaires using the data from their original development paper(s). The available evidence for each psychometric property for each questionnaire was rated on a 4-point ratings scale (no evidence; limited evidence; moderate evidence; strong evidence). For the questionnaires identified through the search, the quality appraisal process was carried out independently by two of the authors (SC and JL) who then discussed any discrepancies with one of the other authors (SS) until reaching consensus. For the parent questionnaires of the adapted questionnaires, the quality appraisal process was carried out by one of the authors (SC) with any queries discussed with one of the other authors (SS).
Almost all questionnaires evaluated internal consistency and generally there was strong evidence for this. The other indicators of reliability were much less frequently evaluated with only 4 questionnaires reporting test-retest reliability and 5 for inter-rater reliability.
Content validity was relatively frequently assessed with moderatestrong evidence in favour overall. Overall, criterion-related validity was rarely assessed with moderate evidence in favour. Many study authors noted the difficulty of gathering data for a gold standard criterion for mental health constructs especially in conflict-affected low resource settings. Construct validity was mostly assessed using within-scale analyzes (although this produced variable quality of evidence), convergent validity or some other form of hypothesis testing. Notably responsiveness was not evaluated for any questionnaire.
For the 24 questionnaires that were adapted for use in new settings, the results of psychometric appraisal based on evidence from the original development papers (i.e. in the original setting) are presented in Table 4 . Notably, a higher proportion asses test-retest reliability, some forms of construct validity and responsiveness. The quality of evidence reported in favour of these original development papers is also, on average, higher and more consistent in comparison to the results for the questionnaires adapted for use in conflict-affected settings.
This review included 30 studies which reported measurement properties from 33 unique questionnaires. There was high variability in the range of measurement properties reported and the quality of questionnaires. Overall, for the measurement properties reported, there was moderate evidence for reliability and validity, although there were many gaps in the availability of data.

Discussion
Our findings show the growth of publications in this area over the past two decades, reflecting those of other systematic reviews on mental health among conflict-affected populations in LAMICS. ( Charlson et al., 2019 ) There has also been increasing recognition of the particular importance of psychometrics in this field to facilitate the development of good quality questionnaires that can be administered by non-specialists in LAMICs. ( Rasmussen and Jayawickreme, 2020 ) However, gaps remain. There were few studies involving IDPs despite there being almost twice as many IDPs as refugees globally. In terms of outcomes, the eligible studies mostly focus on PTSD, depression or anxiety and neglect other serious mental illnesses such as psychotic disorders, alcohol disorder and other substance misuse disorders. In addition, the vast majority of the study authors were from HICs adding weight to concerns expressed elsewhere about the inequitable authorship in research with conflict-affected populations in LAMICs. ( Sibai et al., 2019 ;Siriwardhana et al., 2011 ).
There was variation in the evidence presented for different measurement properties. Internal consistency was frequently reported with strong evidence but this does not necessarily constitute sufficient evidence of reliability. ( U. S. Food and Drug Administration Center for Biologics Evaluation and Research, 2006 ) The majority of studies did not assess content validity and, of those studies that tested for content validity, most studies did not present a conceptual framework reflecting findings elsewhere in refugee research that there is a lack of theoretical bases to questionnaires. ( Hollifield et al., 2002 ) This is an important finding as lack of clarity about the construct that is being measured will reduce the extent to which other psychometric properties can be demonstrated. An instrument without a clear conceptual underpinning is therefore less likely to be robust.
No studies reported on responsiveness or predictive validity. Given that that the purpose for most of these questionnaires included is discriminative (i.e. to detect mental health disorders as part of a prevalence survey) rather than evaluative or predictive, these measurement properties are perhaps less relevant depending on the intended use of the questionnaire. However, if a questionnaire is intended to detect clinically meaningful change (i.e. for evaluation of an intervention) then responsiveness needs to be established to ensure that the questionnaire is fit for purpose.   ( Ventevogel et al., 2007 ) Translated and back-translated with focus group discussion Due to low levels of literacy, questionnaire administered by a trained lay interviewer Responded on a 4-point Likert scale from 1 (not at all) to 4 (extremely) Score calculated by dividing the total score by number of items answered to generate an anxiety and a depression score ranging from 1 to 4 Pashtuns living in Eastern Afghanistan during the conflict attending for primary care services (Pashto), 1 month HSCL-depression subscale ( Bolton, 2001 )     (1) Re-experiencing (2)     AUDIT (Blair, 2017) +++ Community-based anger measure (Liddell, 2013) Culturally adapted checklist for complicated grief (later developed into the complicated bereavement module of the R-MHAP) (Tay, 2016) Culturally adapted checklist for PTSD and CPTSD ) HSCL-25 (Elsass, 2009) +++ HSCL-25 -depression subscale (Bolton, 2001) HSCL-25 (Ventevogel, 2007) •• HTQ (Tay, Jayasuriya, et al., 2017) • HTQ (Tay, Mohsin, et al., 2017) +++ ICD-11 Trauma Questionnaire for CPTSD (Dokkedah, 2015) • ICD-11 Truama Questionnaire for PTSD (Dokkedah, 2015) • International Trauma Questionnaires (Valli ѐ res, 2018) +++ PTSD and CPTSD R-MHAP modules (Silove, 2017) PTSD and CPTSD R-MHAP modules ) +++   Package; RHS-15: Refugee Health Screener * These quality appraisal results are solely based on the evidence presented in the development papers for the adapted questionnaires included in the review to allow for comparison between the evidence reported in the original settings (often non-conflict-affected) and the evidence for the questionnaires adapted for use in conflict-affected settings (as presented in Table 3 ) We did not find a clear distinction in quality between newly developed questionnaires and the questionnaires adapted for use in new settings. For the questionnaires adapted in multiple different settings (e.g. the HSCL-25) there was not strong consistency in the measurement properties recorded across different settings. For the adapted questionnaires, the quality appraisal results were slightly weaker in comparison to the results from the quality appraisal results for the original development papers, providing weak evidence that the quality of questionnaires in conflict-affected settings is lower than in non-conflict-affected settings.
The availability of data makes it difficult to truly understand the differences in quality between newly developed and adapted questionnaires or the different properties for the same questionnaire adapted in multiple different settings. Appraising the quality of the psychometric data was also made difficult by variations in psychometric nomenclature and reporting standards as has been found by psychometric reviewers in other fields. ( Mokkink et al., 2010 ) Included studies also frequently referenced data for measurement properties from questionnaires validated in different settings, which made it difficult to apply strict psychometric criteria.
There are clearly many logistical, methodological and ethical constraints in conducting research on mental health in conflict-affected settings. Designing and conducting a high-quality validation study is a lengthy process that requires highly skilled personnel and adequate long-term funding. These are not requirements that necessarily fit well with the resources available in conflict-affected settings. ( Blanchet et al., 2017 ) The challenge lies in finding the balance between generating adequate quality and utility of evidence for questionnaire-based studies on mental disorders whilst working within resource constraints.

Recommendations
The results from this review suggest that the most pressing priorities are to: (i) conduct research equitably with more involvement of researchers from LAMICs and involving a broader range of affected populations (particularly IDPs); (ii) emphasise the need to develop a conceptual framework and fully test content validity as part of the process of developing a new questionnaire; (iii) improve reporting standards, including clearly stating the intended purpose for questionnaires and reporting measurement properties accordingly; (iv) encourage more thorough testing of reliability instead of relying solely on internal consistency; (v) establish appropriate methods for criterion-related validity when there are inadequate resources for establishing the diagnosis through clinical interview and; (vi) strengthen capacity in LAMICs for the use of such methods .
Mental health services for conflict-affected populations in LAMICs are often co-ordinated by humanitarian agencies who need adequate mental health data to guide service provision. The key policy implications from the results of this review for such humanitarian agencies and other services providers are to: (i) scrutinise the quality of the mental health questionnaires used to inform decision-making processes (ii) acknowledge the limitations of the data gathered by such measures (iii) define the acceptable limits for the quality of mental health measures according to the nature of the decision(s) to be made based on the data gathered and; (iv) invest adequate resources into development work for mental health measures to allow for the collection of adequate data.

Limitations
Limitations for this review include that only English and French papers were included which is likely to have missed relevant data from other languages. The identification of 5 extra articles for inclusion by manual searching indicates that, despite the broad scope of the search terms, further studies may also have been missed. Questionnaires for general psychological health and mental distress, including locally derived outcomes, were excluded as the focus of this review was on diagnostic instruments to allow for comparisons to be made across settings although we acknowledge that this limits the scope of this review.

Conclusion
This systematic review assessed the quality of mental health questionnaires that have either been developed or validated in conflictaffected settings in LAMICS. It highlighted the limited quantity and quality of questionnaires. Key priorities are to: improve equity in authorship and populations covered; strengthen research capacity on this topic; and stronger use of conceptual frameworks and reporting standards to allow future users of the questionnaires to more easily discern whether the questionnaires are appropriate for use with other conflict-affected populations.

Declarations of Competing Interest
None.

Role of the funding source
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.