Impact of state weights on national vaccination coverage estimates from household surveys in Nigeria

Highlights • Nigeria’s national survey estimates of DPT3 and MCV1 coverage fluctuate greatly in recent years.• Much of the variation results from differences in surveys weights, not coverage.• Both USAID DHS and UNICEF MICS allow weights to vary from round-to-round.• Nigeria’s National Nutrition & Health Survey weights do not vary much due to post-stratification.• To compare results from surveys proximate in time, use similar strata weights for clarity.


DHS weight calculation
In general, the weight associated to each child in cluster c of stratum s in a DHS survey can be calculated as where • M s = the total number of clusters in stratum s according to the frame.
• m s = the number of clusters sampled in stratum s.
• H sc = the total number of households listed in cluster c of stratum s after listing is completed.
• h sc = the number of households sampled in cluster c of stratum s.
• r (h) s = the household response rate in stratum s.
• r (w) s = the women response rate in stratum s.
Exact weight calculation varies from survey to survey depending on specific details in the sampling design. Although survey reports discuss using probability proportional to size (PPS) for cluster selection, many recent DHS surveys used simple random sample (SRS) in the cluster selection stage. For surveys that actually used PPS, the individual child weight can be calculated as where • N s = the total population of stratum s according to the frame.
• N sc = the total population of cluster c of stratum s according to the frame.

MICS weight calculation
In general, the weight associated to each child in cluster c of stratum s in a MICS survey can be calculated as s is the children response rate in stratum s Although technical document on MICS design and sampling discusses using PPS for cluster selection, all the MICS surveys we examine in this paper used SRS in the cluster selection stage.

NNHS weight calculation
In general, the weight associated to each child in each cluster of stratum s can be calculated as where g s is the state-level post-stratification factor applied to the based weight, such that the final state weights after post-stratification match the population distribution figures provided by NPopC based on the 2006 census. It should be noted that the "self-weighting" in the NNHS surveys are carried out by ignoring weighting and simply assigning everyone to have weight 1. Therefore, post-stratification is necessary.

Post-stratification
We post-stratify each survey at the state level using the population proportion by state from 2006 Nigeria Census. Specifically, the survey weight associated to each child, w ij is multiplied by the same multiplicative factor g i = N i /N within each state, w * ij = w ij × g i . Where N i is the population of state i according to the 2006 Nigeria Census and N is the overall population. Mathematically, the national coverage estimate in a survey after post-stratification iŝ Note that post-stratification does not affect the state-level coverage estimates, because it scales all weights within a state up or down by the same factor, which cancels out in the first term of Equation (1). However, this does affect the state weights and the aggregate national estimate, because the common factor remains in the second term of Equation (1) as shown in Equation (2) and shifts the portion of the whole that is represented by state i.

Calculation of the Relative Difference due to State Weights
For each double-digit observed difference between two consecutive surveys, we calculate the relative difference due to state weights (RDSW): where k denotes the earlier survey, l denotes the later survey,p denotes the observed national estimate calculated using the survey's original state weights, andp P S denotes the post-stratified national estimate calculated using the consistent census-based population proportions. It should be noted that we use RDSW in this particular analysis to measure the portion of the observed difference between consecutive surveys attributable to differing state weights -it may not be an appropriate summary in every setting because RDSW is not defined if the denominator is zero and might be difficult to interpret if a) the denominator is very small, b) the common-weight difference exceeds the original or c) the commonweight difference reverses direction from the original. In many situations, it will be clearest to simply report the original difference (p k −p l ) and the common-weight difference (p P S k −p P S l ).

Example Tables Showing Changes in State Weights
In the discussion section of the main manuscript we suggest that nationally representative surveys might consider adding a report annex to describe state weights. This supplement provides a mock-up of what that annex might have looked like for children age 12-23 months in eight of the ten surveys described in the manuscript.

Methods
For the eight surveys described in our manuscript that used a sampling frame based on the 2006 census, we obtained the microdata and summed the survey weights for children aged 12-23m in every state. We divided each state-level sum by the overall sum to obtain the so-called state-weight, which is the portion of Nigerian 12-23m old children assumed to live in each state.
Each (e) the simple difference between the new survey state weights and the weights for the previous survey, regardless of whether the previous survey is a DHS, MICS or NNHS Each number is reported to one decimal place and accompanied by a colored bar graph: blue bars for weights; green for positive differences; red for negative differences.

Results
There are interesting differences represented on nearly every page -either differences with the census, the previous survey in the same family, or the previous survey in another family. Differences between surveys in consecutive years, or the same year are especially unlikely to reflect true changes in population distribution.
• DHS 2008 was the first of these surveys to use the 2006 census as a sampling frame. Its weight for children age 12-23m in the state of Bauchi was much lower than that of DHS 2003. The state-level differences between DHS 2008 and MICS 2007 (only one year before) are notable with absolute value ≥ 1.0% in 13 states.
• MICS 2011 shows some notable differences with MICS 2007 and differences ≥ 1.0% with DHS 2008 for 6 states.
• DHS 2013 shows a pattern of giving additional weight to northern states compared with the census and compared with the previous DHS and the previous MICS. Several differences have magnitude > 2.0%.
• NNHS 2014 uses state weights very close to the 2006 census. Compared with the preceding DHS 2013, it gives substantially more weight to southern states than northern states, which probably helps explain why its vaccination coverage estimates were higher than those from DHS. (See Figures 4-6 in the main manuscript.) • NNHS 2015 is very similar to NNHS 2014.
• MICS-NICS 2016 shows a notable pattern of high weights in the north and low weights in the south compared with the census, the previous MICS, and the NNHS from one year prior.
• DHS 2018 shows more weight in the north and less in the south compared with the census, but not a broad pattern of shift from the last DHS.

Strengths & Limitations
Our suggestion to add such a summary to reports that aggregate results across strata is simple to implement, especially for looking at differences within a survey family. Reports from one survey family do not usually acknowledge other families, so it may not be realistic to think that all families would agree to include the final, right-most column in each of these tables.
These tables were constructed with the benefit of hindsight, so were stratified by northern and southern states, which accentuates some of the structure in this dataset. In a production setting, the states would be more likely to be listed in alphabetical order, which would carry the same information as our tables, but might make patterns less obvious.
We have only shown the tables for children age 12-23 months, which is the reporting standard for vaccination coverage outcomes. A DHS or MICS or NNHS report includes numerous indicators for numerous age groups, so presumably this suggestion would not result in a single table per report, but perhaps one table for each indicator-relevant demographic age group; it might be five or six tables per report rather than one. And in some cases the age ranges for different survey families would be slightly different, so the tables would not always be strictly comparable across survey families, but the overall idea here is to make the changes in survey weights obvious to the interested reader so they can ask an analyst to drill down with the microdata and decompose the portion of apparent shifts due to changes in state weights. If the tables show whether the weights were substantially the same as those in the census or the previous survey(s), they will likely provide enough of a clue to spark those detailed analyses.

Conclusion
It would be straightforward to incorporate tables like these into the annexes of survey reports where results are aggregated across strata to furnish a nationally representative estimate of important health indicators.  Figure 2 in the main paper shows that northern states received substantially more weight in MICS/NICS 2016-17 than NNHS 2015. This section of the supplement portrays the aggregate north-south and eastwest components of state weights from the ten surveys in question by calculating a weighted average of 37 state-level geographic centroids.

Methods
The inputs to the analysis are survey state weights (based on children aged 12-23 months) and geo-coordinate of the spatial centroid of each of Nigeria's 37 states.
For survey k, the weight for state i is calculated as described in the main paper: where i indexes over 37 states and j indexes over the n ik respondents in state i in survey k, and w ijk is the weight assigned to respondent j in state i in survey k. State centroid coordinates Lat i and Lon i were calculated using a GIS shapefile of state boundaries and a user-written Stata [1] command named shp2dta [2]. For each of the ten surveys, the age 12-23 months population-weighted national centroid LAT k and LON k was calculated using the state weights and centroids: Figure S1 shows the national centroid coordinates alone (panel a) and with labels (panel b). All three NNHS centroids cluster closely together because of post-stratification. MICS 2007 and 2011 fall just northeast of the NNHS centroids. DHS centroids are clustered slightly farther northeast. The MICS/NICS 2016-17 centroid is substantially farther north and somewhat farther east than any of the other nine surveys.

Discussion
The 2016 northward shift in state weights amplified the well-documented north-south disparity in vaccination coverage outcomes, giving the appearance of a double-digit drop in DPT3 and MCV1 from MICS 2011 to MICS/NICS 2016-17, and a double-digit drop in DPT3 when examining the apparent difference from NNHS 2015 to MICS/NICS 2016-17. If the MICS/NICS 2016-17 had used state weights more like those from the 2006 census, or more like those from any of the other nine surveys considered here, the apparent drop in coverage would have been smaller.