Mortality among populations affected by armed conflict in northeast Nigeria, 2016 to 2019

Significance The death toll of most wars and resulting crises, especially those in low-income countries, is poorly documented due to limited research and representative ground data. Consequently, humanitarian responders lack evidence upon which to argue for appropriate resources, and suffering and deaths remain uncounted, obfuscating historical narratives. We collected various existing datasets from the protracted crisis in northeast Nigeria, triggered by conflict between the Boko Haram group and authorities. Through statistical techniques, we estimated that some 490,900 people died as a result of this crisis during 2016 to 2019, with death rates in 2016 to 2017 more than twice Nigeria’s national average. Though our estimates feature large error margins, they illuminate the severity of this underfunded crisis and help memorialize its toll.


Introduction
Trends in internal displacement and insecurity Figure S1 shows trends in the number of new internally displaced persons (IDPs) due conflict across Nigeria, as well as the numbers of people killed in Adamawa, Borno and Yobe as reported by the four different insecurity data sources we used in our study. We could not identify any published data going back to at least 2009 on the number IDPs in the three states taken in isolation: as such, it is possible that the IDP trends graph is confounded by displacements due to other conflicts in Nigeria (e.g. northwest, southeast). Figure S1. Trends in the number of new internally displaced persons (IDPs) due to conflict and people killed per year (by source). IDP numbers are as published by the Internal Displacement Monitoring Centre [1], and include displacement anywhere in Nigeria.

Methods
Geographic strata Table S1 lists the different geographic units associated with each LGA, including state, domain (used as the sampling universe of most mortality surveys) and nearest market.  Table S2 shows key characteristics of each of the 70 SMART mortality surveys included in the analysis. Figure S2 shows the LGA-month coverage of the surveys. No surveys could be acquired in 2019 due to the time-limited extent of approvals to access data. We otherwise believe that, with the exception of a small number of surveys done by Médecins Sans Frontières in IDP settlements, we were able to secure the vast majority of mortality data available during 2016-2018.
We cleaned each dataset to remove obviously wrong entries and re-analysed each survey using a fixedeffects Poisson regression, with standard errors adjusted for intra-cluster correlation as per the R survey package. We computed a survey weight in the range 0 to 1 for each observation, equal to the anthropometric quality score automatically generated by ENA software during validation of each survey (this score considers various patterns in the data suggestive of adequate staff training and fieldwork [2]; we normalised scores to 1; see Figure S3), multiplied by the proportion of the intended sampling universe that was actually reached during the survey, as per survey report (or 1.0 if the entire sampling universe was reached).  Figure S2. Coverage of SMART mortality surveys, by state and LGA. Heat colours denote months falling within the recall period of one or more surveys, with increasing colour intensity proportional to a data availability index corresponding to (proportion of month covered) x (survey person-time falling within the LGA-month) x (survey quality score) x (survey sampling coverage), rescaled to unity to provide a relative scale for the amount of information available for each survey-month. Grey LGA-months are those not covered by any survey. Figure S3. Distribution of quality score of SMART mortality surveys included in the analysis. Table S3 lists the different sources of information we drew on to estimate the number of people present in any LGA during any given month, as well as the number of IDPs and refugees leaving, arriving (or returning) into and currently displaced within each LGA. We defined IDPs as people who leave their home due to crisis conditions and move to somewhere else within the same LGA, to another LGA within Adamawa, Borno and Yobe or even outside of the three states, but not crossing an international border. By contrast, refugees do cross an international border. Briefly, for each LGA we forward-or back-calculated population from the time point at which each of the alternative population estimate sources was centred, using the following equation:

Reconstruction of population denominators
where is total population, the assumed growth rate, internally displaced persons and refugees ( = arriving from another LGA, = departing to another LGA). We then took a weighted average of the four time series arising from each population source, with the weight corresponding to a quality score comprised of various criteria [6]. Lastly, we computed the population of children under 5y using the mean proportion in this age group (20%) reported by the SMART surveys we analysed.
In order to estimate flows, we first aggregated IDP groups within each LGA by their year of arrival and LGA of origin, as per the DTM database. Setting 1 Jan 2015 as the start date for population reconstruction, we assumed that everyone who had arrived before then was present at the start date; similarly, further arrivals during 2015 were present as of Jan 2016; and so on. We then interpolated prevalent IDP data as reported by each successive round of assessment, so as to construct monthly time series of IDPs by LGA of arrival and LGA of origin, which we used to compute flows from each LGA to other LGAs for each month. We then used these flow time series in the above equation.
As a final adjustment, we applied a correction to IDP time series whenever reconstructed population for any LGA and any alternative population source became negative, presumably because of systematic error in IDP counting (this occurred in 12/65 LGAs). The correction downward-scaled the time series so as to achieve a minimum of = 0 for all population sources and LGAs. This was done by constraining the ratio of number of IDPs in the LGA to LGA population to ≤ 1. Figure S4 shows trends in the estimated population by LGA, showing that the least stable patterns occurred in Borno state, where large waves of displacement and return occurred during the period. Figure  S5 shows, at state level, the differences in estimates derived from each the four population sources: notably, these differences were not consistent across the three states. Lastly, Figure S6 shows the percentage of each state's population that consisted of IDPs, indicating a relatively stable pattern over the period. We separately estimate that there were 138,000 refugees from Adamawa, Borno and Yobe in neighbouring countries (Niger, Chad, Cameroon) as of January 2015. Refugee numbers had risen progressively to 239,000 by December 2019. Figure S4. Evolution of estimated population, by LGA. Figure S5. Evolution of estimated population, by alternative population source and state. Figure S6. Evolution of the estimated proportion of the population that are IDPs, by state.
Completeness of predictor datasets Figure S7 and Figure S8 show the completeness of candidate predictor datasets, by month (i.e. proportion of LGAs that have complete data for any given month) and LGA (i.e. proportion of months for which a given LGA has a complete data), respectively.

Management of predictor datasets
Market prices Data management. The World Food Programme's dataset of market prices consists of monthly observations from a purposive selection of markets across Nigeria, with relatively more markets being followed in crisis-affected states. While data are available from 2002, in Adamawa, Borno and Yobe states they are very sparse prior to 2015.
We selected the most frequently reported-on commodities in the dataset: gasoline fuel, bread, white cowpea, white maize, millet, local rice, imported rice and white sorghum. For some of these, both wholesale and retail prices were available.
We computed an inflation-adjusted standardised price for 1Kg of each cereal staple, 1L of fuel and 1 loaf of bread as follows. First, we divided reported cereal prices by their unit to come up with a standardised price per Kg. Second, we converted the price to USD equivalent using historical monthly NGN to USD exchange rates obtained from https://fxtop.com/en/historical-exchange-rates.php . Lastly, we adjusted the price for USD inflation (https://www.in2013dollars.com/us/inflation/2010?amount=1) to come up with a price in 2010 USD.
While 12 individual markets were featured in the database for Adamawa, Borno and Yobe states, in practice only five had continuous or at least intermittent data coverage from 2015 onwards: Mubi (Adamawa), Biu and Maiduguri (Borno), Damaturu and Potiskum (Yobe). We assigned each LGA to one of these markets based on proximity criteria (Table S1). Each LGA price series was subjected to moderate spline smoothing to correct for possible data entry errors and interpolate between missing observations.
Setting counterfactual values. After inspecting the market-specific time series of maize price ( Figure S9), we decided for our most likely scenario to take the median values for each market (and thus the LGA attributed to the market) during the low-price periods of January to June 2015 and 2018-2019: this reflects an implicit decision that the price increase should be considered part of the crisis conditions affecting northeast Nigeria. We took the minimum of each market time series as the best-case scenario, and the median of the entire series between 2015-2019 (which therefore includes the price increase period) as the worstcase scenario.

Vaccination coverage
Data management. Nigeria's Vaccination Tracking System (VTS) was developed to help national and state emergency polio operations centres to estimate vaccination coverage and identify areas in need of supplemental vaccination activities. The VTS equips vaccination outreach teams with GPS devices that track the teams' movements. During their field visits, these teams also collect geographic and demographic information on settlements they visit, which is used not only to estimate vaccination coverage, but also to update population estimates. 'Vaccination geo-coverage' during any given time window is quantified as the proportion of 100 m 2 grids in targeted areas that are visited, based on GPS tracks. The system has more recently been used during seasonal malaria chemoprophylaxis campaigns, and as such more generically produced estimates of the geo-coverage of community-based public health interventions.
We downloaded the 'time-trend-analysis' dataset from the now-defunct VTS website (http://vts.eocng.org/ChronicalCoverage/Index ). The dataset is organised by ward (sub-unit within LGA and campaign (e.g. August 2017 polio mass vaccination): for each such instance, the dataset reports geocoverage (estimated as above) and the number of 'geo-coverage denominators', roughly equivalent to similar-size population units, and which we used as weights to come up with a weighted geo-coverage by LGA and month. We then did automatic imputation to infer missing values of geo-coverage, which occurred mainly over the intervals between successive campaigns: for this we used the Random Forest method as implemented by the R mice package [7] (https://amices.org/mice/), with 5 chains of 20 iterations, and using all available predictors to inform the imputation. LGA as a random effect, and unity-normalised geo-coverage denominator weights. Model-predicted geo-coverage is shown in Figure S10. Figure S10. Estimated vaccination geo-coverage as predicted using generalised additive modelling from VTS data, for Adamawa, Borno and Yobe and other states, respectively. Shaded areas represent 95% confidence intervals.

Results
Predictive model Figure S11 and Figure S12 show the model's out-of-sample predictive accuracy on cross-validation. In practice, both graphs represent a worst-case scenario in which prediction always takes place in LGAs that the model has not been trained on (i.e. new levels of the random effect). Because survey coverage in Nigeria is very high, in reality both models predict deaths mostly for 'within-sample' LGAs. Figure S11. Predictive accuracy of the model for CDR on ten-fold cross-validation. Each blue dot shows the number of observed and predicted deaths by LGA. The red line shows perfect prediction. Figure S12. Predictive accuracy of the model for U5DR on ten-fold cross-validation. Each blue dot shows the number of observed and predicted deaths by LGA. The red line shows perfect prediction.

Discussion
Sensitivity analysis: bias in displacement and population data Figure S13 shows how the central estimate (likely counterfactual scenario) of total and excess death toll varies as a function of potential bias in both the four base population estimates and the number of IDPs as reported in the DTM database. Figure S13. Sensitivity of the estimated total (orange) and excess (pink) death toll (as the point estimate given the most likely counterfactual scenario), for both all ages and children under 5y, to varying levels of bias in input population and IDP data. Each sensitivity value is a ratio of actual to observed values, i.e. multiplier applied to the observed data (accordingly, values < 1 imply under-estimation, and vice versa). Sensitivity values for the ratio of true to observed population figures are indicated within the plots to the left of the corresponding output estimates.
Sensitivity analysis: Under-estimation in U5DR Figure S14 shows how the central estimates of total and excess death tolls, by age group, vary with increasing percentages of under 5y deaths that were not detected during SMART surveys. Figure S14. Sensitivity of the estimated total and excess death toll (as point estimates given the most likely counterfactual scenario), for both all ages and children under 5y, to varying levels of bias in input U5DR survey data. Each sensitivity value is a percentage of under 5y deaths that may have been missed during survey interviews.
Trends in the rate of people killed, by accessibility status of LGAs Figure S15 displayed reported rates of people killed (combining the different insecurity monitoring project datasets with our reconstructed population denominators) by source and whether the LGA was fully accessible or not during any given month (accessibility was determined by us based on document review: see Table 1, main text). Figure S15. Trends in the monthly rate of people being killed according to each of the four insecurity monitoring projects for which data were available, and by whether LGAs were accessible or partially / fully inaccessible.
Sensitivity analysis: potential inaccessibility bias Figure S16 shows how the total and excess estimated death toll varies if one assumes varying levels of bias in the estimated CDR for LGAs that were partly or completely inaccessible during any given month. As expected, results are highly sensitive to this potential bias in Borno state, where most LGAs were at least partially inaccessible during the analysis period. Figure S16. Estimated total and excess death toll, by age group and state, for increasing values of potential inaccessibility bias (each value is a multiplier for the estimated CDR, applied for LGA-months where accessibility was partial or none).