Revising the WHO verbal autopsy instrument to facilitate routine cause-of-death monitoring

Objective Verbal autopsy (VA) is a systematic approach for determining causes of death (CoD) in populations without routine medical certification. It has mainly been used in research contexts and involved relatively lengthy interviews. Our objective here is to describe the process used to shorten, simplify, and standardise the VA process to make it feasible for application on a larger scale such as in routine civil registration and vital statistics (CRVS) systems. Methods A literature review of existing VA instruments was undertaken. The World Health Organization (WHO) then facilitated an international consultation process to review experiences with existing VA instruments, including those from WHO, the Demographic Evaluation of Populations and their Health in Developing Countries (INDEPTH) Network, InterVA, and the Population Health Metrics Research Consortium (PHMRC). In an expert meeting, consideration was given to formulating a workable VA CoD list [with mapping to the International Classification of Diseases and Related Health Problems, Tenth Revision (ICD-10) CoD] and to the viability and utility of existing VA interview questions, with a view to undertaking systematic simplification. Findings A revised VA CoD list was compiled enabling mapping of all ICD-10 CoD onto 62 VA cause categories, chosen on the grounds of public health significance as well as potential for ascertainment from VA. A set of 221 indicators for inclusion in the revised VA instrument was developed on the basis of accumulated experience, with appropriate skip patterns for various population sub-groups. The duration of a VA interview was reduced by about 40% with this new approach. Conclusions The revised VA instrument resulting from this consultation process is presented here as a means of making it available for widespread use and evaluation. It is envisaged that this will be used in conjunction with automated models for assigning CoD from VA data, rather than involving physicians.

Objective: Verbal autopsy (VA) is a systematic approach for determining causes of death (CoD) in populations without routine medical certification. It has mainly been used in research contexts and involved relatively lengthy interviews. Our objective here is to describe the process used to shorten, simplify, and standardise the VA process to make it feasible for application on a larger scale such as in routine civil registration and vital statistics (CRVS) systems. Methods: A literature review of existing VA instruments was undertaken. The World Health Organization (WHO) then facilitated an international consultation process to review experiences with existing VA instruments, including those from WHO, the Demographic Evaluation of Populations and their Health in Developing Countries (INDEPTH) Network, InterVA, and the Population Health Metrics Research Consortium (PHMRC). In an expert meeting, consideration was given to formulating a workable VA CoD list [with mapping to the International Classification of Diseases and Related Health Problems, Tenth Revision (ICD-10) CoD] and to the viability and utility of existing VA interview questions, with a view to undertaking systematic simplification. Findings: A revised VA CoD list was compiled enabling mapping of all ICD-10 CoD onto 62 VA cause categories, chosen on the grounds of public health significance as well as potential for ascertainment from VA. A set of 221 indicators for inclusion in the revised VA instrument was developed on the basis of accumulated experience, with appropriate skip patterns for various population sub-groups. The duration of a VA interview was reduced by about 40% with this new approach. Conclusions: The revised VA instrument resulting from this consultation process is presented here as a means of making it available for widespread use and evaluation. It is envisaged that this will be used in conjunction with automated models for assigning CoD from VA data, rather than involving physicians. I nformation on causes of death (CoD) is essential for planning, implementing, monitoring, and evaluating public health at all levels. However, death registration and CoD determination do not happen for many deaths occurring in low-and middle-income countries (LMICs), and the deaths of poorer people are much less likely to be recorded, compounding inequalities. Statistical modelling is used to fill the data gaps, for example, for maternal deaths and malaria mortality. Facilitating complete and accurate CoD determination and death registration in LMICs is therefore a high priority. In the medium-term, this will involve applying verbal autopsy (VA) not only in surveillance sites and household surveys but also as a routine part of civil registration and vital statistics (CRVS) systems (1,2).
VA ascertains probable CoD through interviews carried out with caretakers of the deceased or witnesses of deaths. The method uses questionnaires to elicit pertinent information on signs, symptoms, and circumstances leading to death, generically described as indicators, which are subsequently interpreted into CoD. VA has been increasingly used in various contexts including disease surveillance, sample registration systems, outbreak investigation, and measuring the impact of public health interventions. Because vital registration coverage has not significantly improved in most LMICs, VA data collection has been conducted in a variety of settings such as clinical trials and large-scale epidemiological studies; demographic surveillance systems; national sample surveillance systems; and household surveys. The expanding use of VA in generating mortality data has led to a proliferation of different VA instruments (comprising a set of questions/indicators that elicit pertinent information on signs, symptoms and circumstances preceding death and a corresponding list of CoD) that has impaired data comparability across sites and over time. Limited attention has been given to standardization of CoD interpretation from VAs (3).
Users have different perspectives on the required level of accuracy and categories of cause-specific mortality data, with corresponding impacts on desirable characteristics of VA instruments (4). However, the need for regular nationally representative cause-specific mortality data in settings where a significant proportion of deaths are not medically certified can only be met by death registration including VA as part of national CRVS systems. This requires simpler VA instruments and operating procedures that can produce timely, readily usable and reliable cause-specific mortality data.
To produce a simplified VA instrument, the World Health Organization (WHO) carried out a systematic review of VA instruments and procedures, followed by an expert consultation. Based on accumulated experience from widely-used and validated VA procedures, consensus was reached on a simplified VA instrument for routine use in CRVS systems where deaths are not medically certified.
The 2012 WHO VA instrument comprises a short CoD list aligned to the International Classification of Diseases and Related Health Problems, Tenth Revision (ICD-10) that is ascertainable from a limited number of indicators and amenable to automated processing. The design allows adding a narrative and locally relevant questions and diagnoses as needed. The rationale and processes used to develop the 2012 WHO VA instrument are presented in this article.

VA instruments and procedures
The WHO first encouraged the use of lay reporting of health information in 1956, and from then through the 1990s, developed lay reporting forms and published key design features for studies based on VA methods. With the expanding diversity and use of VA instruments, demands for standardization led to the development of the WHO VA standards in 2007 that included (5) 1 and by the Sample Vital Registration with Verbal Autopsy (SAVVY). 2 However, since PCVA is time-consuming and expensive, computerized coding of VA (CCVA) methods for interpreting VA data have been investigated. Validated CCVA methods can be algorithmic or probabilistic. Algorithmic methods follow a set of predefined diagnostic criteria that can be expert-or data-derived. The Tariff method is an additive algorithm that uses Tariff scores reflecting the importance and uniqueness of each symptom to each CoD. The Artificial Neural Network (ANN) method uses computer algorithms (machine learning), applying non-linear statistics to pattern recognition. The Random Forests method is a machine learning method for interpreting VA based on patterns of indicators from a 'training dataset' (6). Whereas algorithmic methods result in binary outcomes (yes or no) for a single CoD, probabilistic methods determine the probability of a range of multiple causes. The InterVA method applies Bayesian probabilistic methods to a matrix of indicators and CoD, using conditional probabilities derived from available data and expert opinion. This method has been available in the public domain since 2006 (7,8). King and Lu's algorithmic method is able to estimate cause-specific mortality fractions (CSMFs) without individual CoD assignment. The Simplified Symptom Pattern (SSP) method is a data-driven Bayesian approach that combines the King and Lu and InterVA methods.

Review of utilization of VA instruments and procedures
Despite attempts to standardize and harmonize VA instruments, there are multiple instruments in use (9Á11). We conducted a systematic literature review to determine how VA instruments have been used and the uptake of the WHO VA standards published in 2007.
The review included studies reported in peer-reviewed journals from 1986 up to early 2012. Figure 1 illustrates the review process. The WHO instruments and the three related ones briefly described above (INDEPTH, SAVVY and LSHTM) were included in the review. Instruments described as adapted from these were also included. Studies that did not provide details of the instrument used were excluded. A brief description of the 125 eligible studies is available as a Supplementary File. Some studies applied different VA interpretation methods on the same dataset and were counted as a single study for the review of the use of the VA instruments. The selected VA instruments or their adaptations were reported to be used by 112 studies in 41 countries. Table 1 summarizes the identified studies, data collection period, and number of deaths certified, by each VA instrument. VA was mostly used as a research tool in longitudinal health and demographic surveillance and in intervention or epidemiological studies. The first study identified used an adapted version of an early WHO instrument to certify perinatal deaths in Nepal in 1989 (12). From the 112 reviewed studies, 104 reported the number of deaths certified, totalling 159,316. Studies using the INDEPTH instrument certified the largest number of deaths, ranging 1 INDEPTH (www.indepth-network.org) is a network of member centres that conduct longitudinal health and demographic evaluation of populations in LMICs. INDEPTH has built a network of 44 health and demographic surveillance systems (HDSS) across 20 countries in Africa, Asia, and Oceania. The network strengthens capacity of HDSS centres, and mounts multicentre research to guide health priorities and policies in LMICs, based on up-to-date empirical scientific evidence. The network uses VA as a method for determining CoD. 2 SAVVY, proposed by MEASURE Evaluation and the International Programs Center, U.S. Census Bureau, is a system to generate reliable information on mortality levels and CoD at the national level. The SAVVY resource library is a series of best practice manuals and methods for improving the quality of vital statistics where high coverage of civil registration and good CoD data are not available. A SAVVY system collects mortality data from a number of sites throughout a country using multistage probability sampling. SAVVY Methods include determination of CoD with VA. from 100 to 38,306 deaths with a mean of 4269.4, totalling 72,579 deaths (Table 1).
VA has also been applied in national health surveys. In most surveys (e.g. Nepal  Percentages of studies conducted amount to more than 100% because some multicentre studies had sites in more than one continent.
administered for deaths of all ages. In Mozambique, a post census VA was conducted in 2008. All surveys ask for medical certification of the CoD, but the majority rely on VA using a variety of questionnaires. Table 1 and Fig. 2 show that the majority of reviewed studies had sites in Africa (54.5%) and Asia (40.2%), while some were conducted in Central and South America (8.9%). The majority of studies using the WHO (61.3%),   it is difficult to assess the level of uptake of the WHO VA standards, as trends in more recent data collection years may be difficult to interpret due to delays in publication of results, particularly given delays in PCVA interpretation in some sites.
Age groups were reported by 110 studies. For comparisons, age groups were categorized non-exclusively as: stillbirths; under 4 weeks; 4 weeks to 5 years; under 15 years; 15 years and above; maternal deaths; and all age groups ( Table 2). VA instruments have mostly been used for 15 years and above (26.4%) and for all age groups (22.7%). Deaths in children under 5 years old (18.2%) and neonates (18.2%) have also been widely studied.
The most common interpretation method (more than one was used in some studies) was the PCVA (82.9%), followed by probabilistic methods (11.7%), and algorithms (10.8%) (2). Of probabilistic methods, InterVA was most used (61.5%). Only one study used ANN, Random Forest, SSP, Tariff, or King and Lu methods to ascertain CoD.
Validity studies for VA procedures are fraught with difficulties since there is no widely available gold standard, particularly for the majority of LMICs deaths not occurring in health facilities (13). The validity of VA is typically assessed by comparing hospital medical records as gold standard diagnoses for CoD, as well as by making between-method comparisons (e.g. between PCVA and CCVA). The validity of the overall VA process is influenced by the design and content of the questionnaires, field procedures, data interpretation methods, actual CoD patterns, and characteristics of the deceased (14).
Of the 125 studies reviewed, 26 assessed performance of VA procedures in certifying CoD (studies using the same VA dataset but different CoD interpretation methods and/or assessing different validation parameters were included in the review and counted as individual studies) (Tables 3 and 4      Our review of VA studies published up to 2012 highlights variability in the selection, development, and use of VA instruments, as well as in methods of assessment. The review established that there are many adaptations of standard VA instruments. Although instruments may need to be adapted to local contexts, the extent of modifications was not reported by studies and their impact on VA performance and accuracy are not known. The review was hindered by an absence of information on the VA instrument used by a substantial number of studies. The lack of systematic detailed information on methods used undermines the value of experience sharing on use of VA instruments and limits a more accurate understanding of the use of the different instruments and uptake of VA guidelines. Some reports on using VA may have been missed if written in other languages or as yet unpublished.

Simplification of VA standards: the 2012 WHO VA instrument
In December 2011, following the above review process, consensus over a simplified VA instrument was reached among 37 experts from 15 countries in a meeting organized by WHO in collaboration with the University of Queensland, the Health Metrics Network and INDEPTH.
The meeting was followed by a 2-day workshop during which the outcomes of the discussions were consolidated. Participants included key stakeholders, researchers, and those who work routinely with VA instruments. The 2012 WHO VA instrument comprises a total of 221 CoD-related indicators to certify 62 CoD. The instrument is designed primarily for electronic data capture, and WHO data collection software will facilitate this on generic mobile devices. CoD interpretation software also allows assessment without physicians, reducing cost and time lag in VA interpretation, and enhancing comparability across different settings and over time. For those wanting to use paper capture and PCVA, simplified sample questionnaires have been developed for three age groups: under 4 weeks; 4 weeks to 14 years; and 15 years and over, which are available with all other aspects of the 2012 WHO VA instrument at www.who.int/healthinfo/ statistics/verbalautopsystandards As determined by extensive skip patterns, the maximum number of questions to be asked for any death ranges from 104 for a neonatal death to 130 for a maternal death (Table 5). Although users may need to add locally relevant questions, the instrument as defined here should be regarded as the core.

Simplified list of CoD
To develop a VA instrument appropriate for strengthening countries' CRVS systems, we simplified the WHO VA standards; this commenced with generating a shorter list of CoD. Three main criteria characterized essential CoD: (1) Importance: most frequent CoD of global public health relevance (e.g. acute respiratory infections); (2) Diagnostic Feasibility: CoD associated with recognizable symptoms ascertainable by VA (e.g. HIV/ AIDS); and (3) Potential for intervention: CoD can be addressed by public health interventions (e.g. diarrhoeal diseases).
Comparison of the results of most widely used and validated VA instruments and interpretation approaches  including PCVA, InterVA, and Population Health Metrics Research Consortium (PHMRC) methods, enabled the identification of a core group of CoD that can be certified by VA. This core group of CoD was mapped against the 31 causes reported in the 2004 Global Burden of Disease (GBD) study to ascertain the public health importance of individual causes. Finally, consensus on the simplified list of CoD was reached in the meeting with VA experts, based on their experience and available evidence.
In the 2007 WHO VA standards, there were 106 possible CoD to be assigned by physicians, while InterVA-3 and InterVA-M assigned 48 causes and the PHMRC VA instrument reached 51 (5,31,39). To facilitate com-parison, some CoD from the WHO VA standards were re-categorized, creating a set of mutually exclusive, collectively exhaustive CoD categories. Table 6 displays the results from the review and correlation of CoD between the VA instruments and the GBD.
In the review of 125 studies covering 199,158 deaths described above, we collated evidence on CoD certified by VA and reported in studies to illustrate the range of CoD that were observed and certifiable by VA. The top 10 CoD reported were: 'other and unspecified cardiac disease' (44%); 'intestinal infectious diseases' (40.8%); 'acute respiratory infections, including pneumonia' (37.6%); 'HIV/ AIDS-related death' (36.8%); 'pulmonary tuberculosis'  'lack of food and/or water', and 'legal intervention' have not been certified by VA in any of the reviewed studies.
Elimination of CoD was based on low frequency reported by VA studies, not being included in the other VA instruments, and on experts' judgment about their importance, feasibility and intervention potential. As a result, 27 CoD from the 2007 WHO VA standards were subsumed into residual categories, including 'typhoid and paratyphoid', 'leishmaniasis', 'melanoma of skin', 'lack of food and/or water' and 'legal intervention' ( Table 7).
The inclusion of the majority of CoD in the simplified CoD list was based on the consistency between CoD from WHO VA standards against InterVA and PHMRC VA, GBD estimates and coverage in VA studies. All causes included in the GBD and the top 10 most certified CoD reported were retained. During expert meetings, the CoD 'other and unspecified non-communicable disease', 'sepsis', 'anaemia of pregnancy' and 'ruptured uterus' were added to the list. Although not in the GBD or most commonly certified CoD, they were considered feasible for VA certification, provide key information to CRVS, contribute significant mortality burdens and are responsive to interventions. Further modifications included grouping related CoD not having readily distinguishable symptoms into broader categories. For example, 'malignant neoplasm of cervix' and 'malignant neoplasm of uterus' were combined into 'female reproductive neoplasms'. Overall, the simplification process led to a 41.5% reduction in CoD compared with the WHO VA standards CoD list, resulting in 60 CoD. A further two categories were added for fresh and macerated stillbirths, despite not strictly considered as CoD, because of their importance in some settings. Table 8 presents the simplified VA CoD list, structuring the causes into groupings consistent with ICD-10 and showing in the last column how all ICD-10 codes map onto the 62 CoD.

VA questionnaires and indicators
VA questionnaires ask specific questions about signs, symptoms, complaints, or contextual factors that will lead to determining the most probable CoD. Such information that indicates the possibility of specific causes is inclusively termed as 'indicators'. The review aimed to collate evidence from field experience on: (i) specific modifications made to VA questionnaires and their rationales; (ii) utility of specific indicators for CoD ascertainment; and (iii) identification of most and least specific indicators for reaching diagnoses. From the 125 studies reviewed, contact was attempted with 45 randomly selected authors (one per study, unless referred to another), and established with 27. Limited feedback was gathered on specific indicators, as most researchers were not able to report on specific modifications made to the VA instruments, and they found it challenging to give feedback on the utility,  value and specificity of individual questionnaire indicators. The following alterations to standard instruments were reported: (1) Structural rearrangement of order and categorization of questionnaire modules and changes in targeted age groups; (2) Attempts to shorten the VA questionnaires by removal and modification of questions related to the duration of signs and symptoms; and (3) Addition of disease-specific questions for local conditions and research needs.
Overall, users considered the 2007 WHO VA standards too long and time-consuming, expressing a desire for shorter and more practical instruments. This process of simplification was started by drafting diagnostic criteria for each CoD by listing symptoms indicated in the Oxford Text Book of Medicine (40). Subsequently, experts identified essential indicators for differentiating CoD, and inclusion/exclusion was based on likely recognition, recollection, and reporting in VA interviews. Evidence of indicators' utility was gathered by correlating indicators from WHO VA standards, InterVA, and PHMRC VA procedures. Furthermore, the simplification of the WHO VA standards indicators was informed by a progressive item reduction process based on the Tariff method (27). Participating experts from PHMRC had applied the Tariff method to the PHMRC validation dataset and tested the effect of dropping items or sets of items on chance-corrected concordance and CSMF accuracy. These findings comprised one element of the discussion on the evidence base for some CoD. Indicators removed had low specificity and possibly generated answers with low reliability due to recall difficulties. These were mainly sub-indicators on the duration, frequency, and development of signs and symptoms. Other modifications made included the addition of indicators from InterVA and PHMRC VA instruments, the removal of overlapping indicators capturing very similar information, and the inclusion of social context indicators, facilitating use of the instrument in nonenumerated populations. Overall, 164 indicators were retained from the 2007 WHO VA standard, 57 new indicators introduced and 244/408 indicators from the 2007 WHO VA standard excluded. Review by expert groups Á for relevance to the list of causes, reliability, and feasibility Á and comparison with machine assessment analysis led to a reduction of 45.8% in number of CoD-related indicators in relation to the WHO VA standards, resulting in a total of 221 indicators (of which various subsets apply to particular population sub-groups).

Application of the 2012 WHO VA instrument to facilitate routine surveillance
The need for consensus on simplified technical standards and guidelines for VA, together with their widespread endorsement and adoption, has become urgent. The systematic use of the 2012 WHO VA instrument will strengthen countries' CRVS systems. In the past decade, methodological developments in automated methods for VA assessment have created a shift away from limited individual-level and clinical paradigms towards population-based epidemiological and public health thinking. To facilitate application in routine surveillance systems, the new simplified VA instrument was specifically developed for automated ascertainment of CoD. At present, the InterVA-4 model, as previously described (41), is the only available automated interpretation tool fully aligned with the 2012 WHO VA instrument. A simple, automatically interpreted VA process will lead to increased coverage of operational and representative CRVS systems. Shorter and simpler interviews not needing physicians for CoD interpretation will facilitate collection of adequate data for CRVS systems. CCVA brings efficiency and consistency by providing a standardized interpretation of VA. The new 2012 WHO VA instrument will be piloted, modified, and integrated into national health information systems.