Causes of death among persons of all ages within the Kilifi Health and Demographic Surveillance System, Kenya, determined from verbal autopsies interpreted using the InterVA-4 model

Background The vast majority of deaths in the Kilifi study area are not recorded through official systems of vital registration. As a result, few data are available regarding causes of death in this population. Objective To describe the causes of death (CODs) among residents of all ages within the Kilifi Health and Demographic Surveillance System (KHDSS) on the coast of Kenya. Design Verbal autopsies (VAs) were conducted using the 2007 World Health Organization (WHO) standard VA questionnaires, and VA data further transformed to align with the 2012 WHO VA instrument. CODs were then determined using the InterVA-4 computer-based probabilistic model. Results Five thousand one hundred and eighty seven deaths were recorded between January 2008 and December 2011. VA interviews were completed for 4,460 (86%) deaths. Neonatal pneumonia and birth asphyxia were the main CODs in neonates; pneumonia and malaria were the main CODs among infants and children aged 1–4, respectively, while HIV/AIDS was the main COD for adult women of reproductive age. Road traffic accidents were more commonly observed among men than women. Stroke and neoplasms were common CODs among the elderly over the age of 65. Conclusions We have established the main CODs among people of all ages within the area served by the KHDSS on the coast of Kenya using the 2007 WHO VA questionnaire coded using InterVA-4. We hope that our data will allow local health planners to estimate the burden of various diseases and to allocate their limited resources more appropriately.

and symptoms that precede deaths (2,3), it remains the only practical approach through which to estimate CODs in most resource-poor settings in which the majority of deaths occur outside the formal health system (2, 4Á8). The approach has been used to determine CODs among people of all ages in a wide range of settings (2, 7, 9Á16); however, in Kenya, although VAs have been widely used as a tool for documenting CODs (7, 12Á14, 17Á21), no study has yet been published in which the WHOcompliant InterVA-4 has been applied across deaths of all ages. The InterVA-4 model uses posterior probabilities for CODs, given an a priori distribution of CODs in the population and conditional probabilities for circumstances leading to death. The current study was conducted with a view to using the InterVA-4 model to describe CODs among all residents of the area served by the Kilifi Health and Demographic Surveillance System (KHDSS), where close to 60% of deaths occur outside the formal health system. This study is also part of the multisite cause of death dataset (22) available in the public domain at the INDEPTH Data Repository (23).

Study population
The KHDSS study area has been described in detail previously (24). Briefly, the KHDSS was established in 2000 and became a member of INDEPTH network in 2005. The KHDSS covers an area of 891 km 2 and has a current resident population of Â265,000. The population register is updated three times each year and records between 1,200 and 1,500 deaths annually.
VA, using the standard 2007 World Health Organization (WHO) VA tools (25), were first introduced into the KHDSS activities in 2008 (12). These tools include three separate VA questionnaires that are used to collect data on neonates (0Á28 days old), children (29 days to 14 years old), and adolescents and adults (!15 years). Structurally, the questionnaires contain a short open narrative section that is followed by a cascade of closed questions. The narrative part provides the respondent's chronological account of any illnesses and events that led to death, while the closed questions filter through the history and details of the illness. Age-appropriate VA questionnaires were generated for each death along with household maps and listings to guide interviewers to the appropriate respondents. With a view to minimizing transcription errors, we generated pre-filled VA questionnaires that included all basic demographic details such as names, personal identifiers, dates of birth and death for confirmation by the interviewers before their interviews. Completed VA forms were checked for errors by the field supervisor before entry into a database written in FileMaker Pro TM v11 (FileMaker, USA).
The InterVA-4 model CODs were assigned using the InterVA-4 computer-based probabilistic model (25). The original model used an expert panel to develop a coding system based on relatively broad COD categories. The current version (Version 4.02) (25) has evolved, and has benefitted from testing and validation, largely using data from sub-Saharan Africa (4,12,13). The InterVA-4 software is freely available in the public domain (25). For our analysis, COD categories were obtained by running InterVA-4 in batch mode on the input indicators with both malaria and HIV prevalence set to 'high' for all the age categories.

Data management
All VA forms were scrutinized by a clinician for anomalies before double entry into a computer database written in FileMaker Pro TM v11 (FileMaker, USA). The VA data were further transformed to align with the WHO 2012 VA standard format and were then processed using the InterVA-4 model to assign COD. These VA standard formats and the specifications have been described in detail previously (26). Where available, we supplemented the model with information from the open narrative section for input data variables that were not targeted systematically in the closed section of the VA questionnaires. Of particular note, we included data on a set of reported hospital diagnoses that were required as input data for InterVA-4 but that were not included on the list of previously known medical conditions in the 2007 WHO questionnaires. Examples included sickle cell disease (SCD) and congenital malformations. Such information was extracted by keyword searches of the free-text narrative section. For example, to create the congenital malformation (born_malf) indicator, we searched for the strings 'heart', 'congenital', 'hydrocephalous', 'spina bifida', and 'hole in the heart' within VAs collected from children who died at B5 years of age. The congenital malformation indicator was left unchanged if it had already been marked 'YES' but was converted from 'NO' to 'YES' in cases where heart disease had been selected in the list of previously known medical conditions. Similarly, injuries associated with sexual assault (rape) were not always captured in the structured questions but were sometimes described in the narrative section, facilitating mapping to the 'assault' indicator. Injury may either be intentional (e.g. assault or suicide) or accidental. The main warning messages flagged by the model when we input our data, involved cases where, for example, suicide was inconsistent with other injuries.
Finally, the variable 'born_small' requires either the birth weight or data to suggest that the child was small at birth as reported by the respondent. Some respondents indicated that the baby was born small while the birth weight recorded was more !2.5 kg or vice versa. Where responses were recorded for both indicators, therefore, we only considered the birth weight in mapping the 'low birth weight' (born_small) or 'big baby' (born_big) indicators.

Data analysis
Deaths were aggregated for all individuals in the study population for the period January 2008 to December 2011. An analytical dataset was constructed from the model's output in which each VA case had one or more records. Each record having one cause and a weight corresponding to the likelihood of that cause for the particular VA case. The model assigned multiple CODs if they reached half of the likelihood of the leading cause. Any residual margin of likelihood not accounted for by the likelihood of the first, second or third causes was then considered as indeterminate. The possible CODs determined by the InterVA-4 model were derived for all ages. Cause-specific mortality fractions (CSMF) were determined as the proportion of all deaths that were attributable to any specific COD. We stratified mortality analyses by sex and seven age groups (neonates, infants, and ages 1Á4, 5Á14, 15Á49, 50Á64 and over 65 years). The age-group boundaries were chosen to reflect groups of public health importance. All statistical analyses were carried out using STATA V11 (Timberlake, USA) and the statistical software environment R (http://www.r-project.org/).

Ethical approval
Individual written informed consent was obtained by interviewers from all VA respondents. The study was approved by the KEMRI/Wellcome Trust Scientific Coordinating Committee (SCC) and by the KEMRI Scientific Steering Committee (SSC), both in Kilifi, and by the KEMRI/ National Ethical Review Committee (ERC) in Nairobi.

Results
A total of 5,187 deaths were recorded among the resident population of the KHDSS between January 2008 and December 2011. One thousand one hundred and eighty one cases (23%) were among children B5 years old, 242 cases (5%) were aged 5Á14 years, and 3,764 cases (73%) were !15 years of age, of which 1,665 cases (32%) were !65 years old (Table 1). Of the deaths among children B5 years old, 810 (68%) occurred before the first year of life of which 469 (58%) were neonates. Male deaths out-numbered female deaths in most age groups. VA interviews were successfully completed for 4,460 (85%) of these deaths of which 2,304 (52%) were among males (Table 1). Failure to identify an appropriate respondent was the main reason for missing data. Among those deaths in which a VA was performed, 57% occurred at home and 36% occurred in a health facility. Most neonatal deaths occurred in hospitals while more of the elderly died at home. The average recall period (time between VA interview and death) was 194 days. Among those aged !15 years (n03,310), 10% were single, 55% were legally married, 28% were widowed, 4% were divorced, and 3% were separated. The InterVA-4 model assigned a single COD to 3,886 cases (87%), two CODs to 422 cases (9%), three CODs to 22 cases (1%), and 130 cases (3%) to 'indeterminate'.

Cause-specific mortality fractions
The top five CODs for the whole population overall were: HIV-/AIDS-related causes, acute respiratory infections including pneumonia, malaria, and pulmonary tuberculosis. Pneumonia was the most common COD among the infants while malaria and HIV were the commonest CODs among children 1Á4 years and adults aged 15Á49 years, respectively. Pulmonary tuberculosis was the leading COD among the elderly age groups.

Infant deaths (n 0286)
The top CODs in infants (excluding neonates) were pneumonia, HIV/AIDS, and malaria which accounted for 28, 21 and 12% of deaths, respectively. Five percent of deaths were attributed to sickle cell crisis while other COD categories totalled 14%. Twelve percent of deaths  were indeterminate ( Table 2). No sex-specific differences were noted with the exception of pneumonia, which was more common in female than male infants (Fig. 1).
Children 1Á4 years old (n 0302) Malaria, HIV/AIDS, and pneumonia were the predominant CODs in this age group, causing 23, 18, and 12% of the childhood deaths, respectively, while diarrheal diseases, severe malnutrition, sickle cell with crisis, and road traffic accidents explained 3, 2, 2, and 1% of deaths, respectively. All other CODs accounted for 7% while a further 9% of deaths were indeterminate ( Table 2). A slightly higher proportion of girls than boys died of pneumonia while no differences were noted between the sexes for death due to malaria (Fig. 1).

5Á14 years old (n 0218)
Relatively fewer deaths were observed in this age group. The commonest CODs were HIV/AIDS related and malaria, which accounted for 23 and 13% of total deaths, respectively. Deaths due to pneumonia, road traffic accidents, meningitis/encephalitis, and sickle cell with crisis explained 8, 7, 5, and 4% of deaths, respectively. Eight percent of cases were indeterminate ( Table 2). Unlike the previous age group, in this age group a higher proportion of boys appeared to die of pneumonia and external CODs than girls (Fig. 1).

15Á49 years old (n 01,068)
A third of deaths in this age group were attributed to HIV/AIDS and pulmonary tuberculosis. The majority of those who died of HIV/AIDS were either divorced or separated (60%). Road traffic accidents, assault, and stroke accounted for 5, 3, and 3%, respectively. Overall, 7% of deaths were indeterminate ( Table 2). A higher proportion of women (29.1%) than men (12.2%) in this age group died of HIV/AIDS while more men (23.8%) than women (2%) died of external CODs (Fig. 1). A total  of 62/565 deaths among women of reproductive age were due to maternal causes of which the two most common were obstetric hemorrhage (51.6%) and pregnancyinduced hypertension (25.8%).
50Á64 years old (n 0729) All neoplasms, pulmonary tuberculosis, HIV-/AIDSrelated deaths and stroke were the predominant CODs among people of this age group, accounting for 16, 12, 12, and 10% of deaths, respectively. Other unspecified cardiac diseases explained 5% of deaths. Eight percent of the deaths were indeterminate ( Table 2). More males (14.7%) than females (8.6%) died of pulmonary tuberculosis whereas more females (18.1%) than males (3%) died of HIV/AIDS. Women were more likely to die of stroke than men in this age group (Fig. 1).
Above 65 years old (n 01,513) All neoplasms, stroke, and pulmonary tuberculosis predominated, accounting respectively for 15, 14 and 10% of total deaths. Cardiac diseases accounted for 10% of deaths and 10% were indeterminate ( Table 2). A higher proportion of women (16.8%) than men (11.3%) died of stroke whereas in this age group men were more likely than women to die of pulmonary tuberculosis (Fig. 1). The pattern of mortality as determined by the InterVA-4 model shows a high burden of infectious diseases, including HIV/AIDS, pneumonia, and pulmonary tuberculosis in the study population. These mortality patterns are consistent with existing knowledge on the burden of disease in many parts of sub-Saharan Africa.

Discussion
The majority of previous studies that have reported CODs among rural populations in the developing world have used the physician certified verbal autopsy (PCVA) approach to COD assignment (27Á33). A VA programme based on PCVA requires the involvement of a minimum of three clinicians with skills in the method. As a result, the approach is expensive, a distraction to skilled personnel who represent a scarce resource in less developed countries, and is a frequent bottleneck in the provision of timely data (3). Furthermore, the PCVA approach is vulnerable to bias from physician coders, who are prone to preconceptions about the common patterns of death in any given community that are not necessarily evidence based (34). Such considerations have justified the more recent development of automated approaches to VA coding (25,35). One such method is the freely available, WHO-compliant, InterVA-4 model, a method that has recently been adopted by the INDEPTH Network of demographic surveillance sites as the only viable strategy currently available for the timely provision of comparable data from multiple populations (36). In the current study, we have used this method as the basis for assigning CODs in 4,460 subjects of all ages who were residents of the KHDSS area on the coast of Kenya, where we used the 2012 version of the InterVA-4 model to interpret VA data collected using the 2007 version of the WHO VA questionnaires.
In general, our COD estimates appear credible and conform to expected patterns. As anticipated, we found a slight excess of deaths among males (2,304; 52%) in comparison to females (2,156; 48%). This observation was consistent across all age groups individually with the exception of the 15Á49 years age group where, predictably, we found an excess of deaths among women, predominantly from pregnancy-related causes and HIV/ AIDS. Of particular interest, external CODs such as road traffic accidents were more commonly observed among men than women. Since Kenya has recently experienced a surge in the use of motorcycles as a mean of transport, and a majority of riders lack proper training, this is, perhaps, not surprising, but is nevertheless a matter of considerable concern. Similarly, palm wine tapping, a common socioeconomic activity for men within the KHDSS, can sometimes result in fatal accidental falls, especially when men climb the trees having sampled the product. In the elderly, 65' age group, malignancies and stroke were key concerns, the latter being consistent with data from our previous study in which we reported hypertension and diabetes as major risk factors (12).
While in general the process of VA coding was straight forward, we did face some challenges with regard to the input and interpretation of our data. First, in comparison to that reported from previous studies (4, 37), a higher proportion of cases in our analysis were reported as 'indeterminate'. The majority of these cases were associated with scanty signs and symptoms obtained from the respondents, a phenomenon that was particularly common where deaths were sudden or where subjects were found dead. Further research will be needed to establish more reliable methods for COD assignment in such cases. Second, HIV varies considerably from place to place and according to the model, it is necessary to specify whether HIV in the area is high, low or very low, corresponding to ratios of 1:100, 1:1,000 and 1:10,000 of all deaths, respectively (25). Since the prevalence of HIV infection in well-nourished and severely malnourished children within the KHDSS has been reported at 2 and 14%, respectively (38), we set the HIV variable within the model to 'high', as described in the InterVA-4 user guide (25). We suspect, however, that HIV may be overestimated as a cause of death among children within the current analysis, reflecting how difficult it can be to discriminate between HIV and malnutrition on the basis of clinical features alone. On the contrary, the high frequency of HIV/AIDS among adult women of reproductive age seems more plausible and is consistent with previous reports (12,37). Third, because the InterVA-4 model is based on a more recent version of the WHO VA tools, we needed to map our input indicators from the format in which they were collected (on the 2007 version of the WHO VA questionnaire) onto the relevant indicators within the InterVA-4 model, a process that proved to be straight forward with regard to the majority of indicators. Finally, although in general the pattern of deaths reported in our study seems generally plausible, and is supported by our previous validation study conducted in adults within the same area (12), in the current study, COD interpretation has not been validated by any other method. In the future, we plan to undertake further work with this aim in mind.
A major advantage of computer-based coding methods such as InterVA-4 is their potential for providing standardized data across multiple sites and over long periods of time. For this to be viable, it will be important to agree on standardized methods for data input. For example, in our own data set some questions were missing from the WHO 2007 VA instrument but were present in the WHO 2012 instrument. However, these data were partially captured in the free-text sections of the VA form, for which we developed an automated search for the 'keyword' and mapped it onto the corresponding indicator. This was particularly important in capturing specific diagnoses such as SCD and specific forms of injury. While reviewing and modifying the questionnaires from time to time has certain advantages, it also leads to problems in the analysis and interpretation of longitudinal data. If the method is to be used successfully for the purpose of comparisons between multiple sites, collaborative discussion will be needed to agree on protocols for data input and interpretation.

Conclusions
Our study suggests that both the WHO 2012 instrument and the InterVA-4 model are feasible tools to measure cause-specific mortality, which may potentially inform both health policy and program interventions in resourcelimited settings. The model requires minimal time and labor resources, especially in comparison with the PCVA method. The current analysis, using InterVA-4, returned COD patterns that were generally credible at the population level and in subjects of all ages. While further work will be required to fine-tune and validate COD analysis using InterVA-4, particularly in children, we hope that our results will prove useful for informing health intervention policies both locally and internationally.

Authors' contributions
CN reviewed the literature, analyzed, and interpreted the data and drafted the paper. EB contributed to the study design, literature review, interpretation of the results, and drafting of the manuscript. GM, AM, and PK helped with VA data coding, interpretation of the results, and editing of the manuscript. BT helped with data interpretation and edited the paper. VN helped in interpreting the data and editing of the paper. GN helped with data management and editing of the paper. AE was responsible for managing the adult hospital surveillance and helped in editing the paper. PB developed the Inter-VA model, helped with the analysis, and edited the final version of the paper. TNW conceived the study and edited the final version of the paper. All authors read and approved the final version of the manuscript.