Monitoring childbirth care in primary health facilities: a validity study in Gombe State, northeastern Nigeria

Background Improving the quality of facility-based births is a critical strategy for reducing the high burden of maternal and neonatal mortality and morbidity across all settings. Accurate data on childbirth care is essential for monitoring progress. In northeastern Nigeria, we assessed the validity of childbirth care indicators in a rural primary health care context, as documented by health workers and reported by women at different recall periods. Methods We compared birth observations (gold standard) to: (i) facility exit interviews with observed women; (ii) household follow-up interviews 9-22 months after childbirth; and (iii) health worker documentation in the maternity register. We calculated sensitivity, specificity, and area under the receiver operating curve (AUC) to determine individual-level reporting accuracy. We calculated the inflation factor (IF) to determine population-level validity. Results Twenty-five childbirth care indicators were assessed to validate health worker documentation and women’s self-reports. During exit interviews, women’s recall had high validity (AUC≥0.70 and 0.75<IF<1.25) for 9 of 20 indicators assessed; six additional indicators met either AUC or IF criteria for validity. During follow-up interviews, women’s recall had high validity for one of 15 indicators assessed, placing the newborn skin-to-skin; two additional indicators met IF criteria only. Health worker documentation had high validity for four of 10 indicators assessed; three additional indicators met AUC or IF criteria. Conclusions In addition to standard household surveys, monitoring of facility-based childbirth care should consider drawing from and linking multiple data sources, including routine health facility data and exit interviews with recently delivered women.

The childbirth process presents a time of great risk of death for women and their newborns [1,2]. Of the estimated 303 000 maternal deaths and 2.5 million neonatal deaths that occurred in 2015, 113 000 maternal deaths and over 1 million neonatal deaths were attributed to complications from childbirth and the immediate postpartum period [3,4]. The distribution of this risk of death is uneven. While 36% of the world' s population lives in sub-Saharan Africa and Southern Asia, these regions account for 86% of maternal deaths and at least 78% of the newborn deaths [1][2][3][4][5]. For facility-based births, improving the quality of care for women and newborns especially during the intrapartum period VIEWPOINTS PAPERS is considered one of the most effective strategies for reducing maternal and neonatal mortality and morbidity across all settings [1,[6][7][8][9][10].
Global and national monitoring of facility-based care often includes self-reported retrospective data collected in household surveys such as the Demographic and Health Survey (DHS) and Multiple Indicator Cluster Survey (MICS) [11][12][13]. For population-based coverage estimates of childbirth care, these periodic and nationally representative surveys collect a limited set of data which include maternal background characteristics and birth history, delivery by a skilled birth attendant, and newborn care practices [14,15]. A small number of criterion validity studies of childbirth care which measured the extent to which the women' s self-reported data at different recall periods align with a gold standard, have demonstrated mixed results on the accuracy of data in household surveys [16][17][18][19][20][21][22]. Understanding how best to accurately monitor childbirth care is an emerging research priority and evidence from different contexts is required [23,24].
Routine data can be used to monitor the content of facility-based care, but concerns about completeness, consistency, and accuracy have hampered their use [13]. Most studies on the accuracy of routine data have focused on verifying the aggregate data reported by facilities to higher management levels and comparing these to data documented by health workers [25][26][27][28][29][30]. However, similar to the population-based surveys, the extent to which the data documented by health workers reflect the "truth" of care is also not well-established [31].
In the high mortality setting of northeastern Nigeria, we assessed the extent to which different data recording methods could contribute to the global-and national-level monitoring of maternal and newborn health. Using direct birth observations as a gold standard, we compared these observations to: (i) facility exit interviews with women after childbirth; (ii) household follow-up interviews with women nine to 22 months after childbirth; and (iii) health worker documentation of childbirth events in the facility maternity register.

Study setting
Gombe State, northeastern Nigeria, has high maternal and newborn mortality at 814 per 100 000 live births and 35 per 1000 live births, respectively; nationally, maternal mortality estimates are also 814 per 100 000 live births and neonatal mortality estimates are 39 per 1000 live births [3,4,14,15]. Gombe is predominantly rural and 44% of the population have some primary school education. Most women access maternity care through public facilities. Seventy-two percent of women reported at least one antenatal care visit during their last pregnancy and 29% gave birth in a health facility [15]. In 2018, over 70% of facility deliveries took place in rural primary health facilities [32].

Indicator selection
Twenty-five indicators were selected, focusing on the content of childbirth care ( Table 1): skilled birth attendance and companionship during labor and delivery; care for the woman (maternal background characteristics, provider practices and respectful care, clinical care); and care for the newborn (immediate postnatal care and newborn outcomes). To select these indicators, we referred to the Ending Preventable Maternal Mortality and Every Newborn Action Plan strategy documents for priority indicators to monitor progress towards Sustainable Development Goals targets [33,34]. We also sought to complement indicators collected in the Nigeria Demographic and Health Survey as well as earlier studies validating childbirth care indicators [14,[16][17][18][19][20].
In Gombe, maternity registers defined essential newborn care as the immediate initiation of breastfeeding and the baby being kept warm within 30 minutes of birth [35]. To determine if the maternity register provided a sufficient approximation to globally-defined indicators, we compared the maternity register' s essential newborn care data to being kept warm and the initiation of breastfeeding within the first hour VIEWPOINTS PAPERS of birth [34]. For validation analyses, the following indicators were converted into binary variables: maternal age at delivery (adolescent births); prior parity (prior parity, four or more births); and baby' s birthweight (low birthweight, <2500 g).

Study sites and data sources
As part of an initiative to improve care in Gombe State, data were collected between 2016-2018, including facility-based birth observations [36]. A summary of each data recording method is provided in

Birth observations
Starting in June 2016, five rounds of birth observations took place in 10 primary health facilities. Each round took place roughly every six months and lasted three weeks. To select the facilities for birth observations, a state-wide random sample of 107 facilities was drawn in November 2015 from approximate-

Indicator Births observation Facility exit interview
Household follow-up interview *Observed women were interviewed before discharge from the facility (exit interview) and at home nine to 22 mo after childbirth (follow-up interview).
Health workers documented childbirth events in facility maternity registers. †For validation analyses, the following indicators were converted into binary variables: age at delivery (adolescent births); prior parity (prior parity, four or more births); and baby' s birthweight (low birthweight, <2500 g). ‡In the facility maternity register, essential newborn care is a composite indicator for (i) immediate initiation of breastfeeding and (ii) baby kept warm.

VIEWPOINTS PAPERS
ly 500 government-owned primary health facilities. The maternity registers were reviewed to determine the volume of births occurring in the previous six months. The 10 facilities with the highest number of births were selected for birth observations [37]. An average of 15.7 births (standard deviation SD = 12.0) occurred per month in the 10 primary health facilities, compared to the state-level average of 4.3 births (SD = 6.3) per month in primary health facilities [38].
All women attending the facility for delivery were invited to participate, excluding women admitted for monitoring before the onset of labor. Women were given a description of the study and the procedures, including the right to withdraw participation at any time. A trained observer (local midwives, not employees of the assigned facility) stayed in the same room to continuously document labor and delivery processes through the first hour after birth, using a structured checklist. Labor and delivery took place in the same room. The mother and newborn were usually kept together until discharged from the facility.
Two observers and one clinical supervisor were assigned per facility to work in shifts and cover all deliveries. Although observers were trained midwives, they had no legal right to intervene in clinical care during the observation period because they were not employed in the same facilities where they were doing the observations. At all times during the observation, the observer prioritized safety of the mother and newborn over data collection; protocols were established on how to seek help in the event of any life-threatening event. Priorities for the supervisor were (i) to ensure that consenting procedures were carried out; (ii) to observe data collection and carry out interrater reliability checks; (iii) to assist in the case of a query from facility employees or from clients and families; (iv) to collect and check digital data at the end of each day.
Before each round, observers underwent four days of practical training to conduct unobtrusive observations, train on safety and confidentiality protocols, and ensure consistency of rating between observers.
Observations were recorded onto a Lenovo A3300 tablet using CSPro version 7.0 (United States Census Bureau and ICF Macro, Suitland, MD, USA). Each observed woman was assigned a unique observation number to facilitate linking information to other data sets.

Facility maternity registers
Following the birth observation, regardless of newborn outcome, the observer extracted data about the woman from the maternity register. Data extraction took place on the same day as the observed birth after the first hour of birth. Data were directly entered into the tablet.

Facility exit interviews
Women were usually discharged within 24 hours of delivery. Each observed woman leaving the facility with a live newborn was invited to participate in an exit interview. The exit interview covered information recorded during the observation and harmonized with questions asked in the DHS and MICS. Each interview was conducted in Hausa by a member of the observation team assigned to the facility. Interview questions are available in Table S1 in Online Supplementary Document.

VIEWPOINTS PAPERS
Household follow-up interviews, nine to 22 months after childbirth In addition to recall during exit interviews, it was of interest to understand the validity of women' s recall in the context of household surveys, such as DHS and MICS. For this purpose, we conducted household-level follow-up interviews with a subset of the observed women to recall childbirth events. To represent a range of recall periods that may be encountered during a household survey, in March 2018 we selected approximately 150 women from each of the first three rounds of birth observations which occurred in June 2016 (22 months recall), March 2017 (15 months recall), and August 2017 (9 months recall); this selection was done by a simple random sample of a de-identified list of women observed per round. Each interview was conducted in Hausa and the women were asked the same questions as in the exit interview.

Sample size
To estimate the sample size, 50% prevalence from clinical observations (gold standard) was set for all indicators as we expected variability in the frequency of indicators. Sensitivity was set at 60% ± 7% precision and specificity at 70% ± 7% precision. Type 1 error was set at 0.05, assuming a normal approximation to a binomial distribution. Thus, a minimum sample size of 400 was required for observed women at exit interviews, at follow-up interviews, and in the maternity register.

Analysis
To combine the data from five rounds of data collection, we tested for marginal homogeneity using Yang' s chi-square test for clustered binary matched pair data using the clust.bin.pair package in R [39,40]. Of the 45 matched pairs analyzed (see Table 1), one indicator showed evidence of clustering across time when comparing birth observations and women' s self-reports at exit and follow-up interviews: birth attendant washed hands with soap before examinations. Given the number of matched pairs analyzed, we considered there to be sufficient evidence that the data collection rounds could be combined.
Validation analyses were performed using Stata 14.2 (Stata Corp, College Station, TX, USA) [41]. Using birth observations as the gold standard, we assessed each indicator' s validity at the individual-and population-level.
To measure individual-level reporting accuracy, we constructed three two-by-two tables for each indicator which compared the birth observation to each data recording method [16,[18][19][20]23]. Missing and "don't know" responses were excluded from the two-by-two tables. We calculated percent agreement between the birth observation and each data recording method.
For two-by-two tables with at least five observations per cell, we calculated the sensitivity (true positive rate) and specificity (true negative rate) for each indicator. We quantified the area under the receiver operating characteristic curve (AUC) and estimated 95% confidence intervals (CI) assuming a binomial distribution. AUC values range from 0 to 1, with 0.5 representing a random guess and 1 representing complete accuracy. An AUC value of 0.7 or higher was chosen as the cutoff criteria for high individual-level reporting accuracy [23].
To measure the population-level validity, we calculated each indicator' s inflation factor (IF), which is the ratio of the estimated population-based survey prevalence to the gold standard' s prevalence. The IF reflects the degree to which an indicator would be over-or under-estimated in a population-based survey. To estimate the population-based survey prevalence, we used the following equation [42]: estimated population survey prevalence = (gold standard prevalence × sensitivity) + [(1 -gold standard prevalence) × (1 -specificity)]. An IF value between 0.75 and 1.25 was the chosen cut-off criteria for low population-level bias [23].

Sample description
Characteristics of the women observed during childbirth are presented in Table 2. Women' s age ranged from 15 to 47 years, with a median age of 24 years (interquartile range (IQR) = 20-28). Forty-four percent of women had at least 4 prior deliveries, 47% of women had no formal education, and 99% were married.
For each indicator and data recording method: indicator prevalence, "don't know" responses, percent agreement with gold standard, sensitivity, specificity, AUC, and IF values are summarized in Table 3. Figure 2 presents a summary of the validity criteria met across data recording methods.

VIEWPOINTS PAPERS
"Don't know" responses, which indicate the extent to which recall may or may not be possible, were greater than 5% for: birth attendant washed hands with soap before examinations (exit and follow-up); baby weighed at birth (exit and follow-up); and low birthweight (exit only). Health workers documented in maternity registers most frequently for: baby weighed at birth (99% completeness), maternal age at delivery (97%), and prior parity (97%). Documentation was least frequent for the composite indicator essential newborn care (82% completeness) and pre-term birth (77%).

Skilled birth attendance and companionship during labor and delivery
Health worker documentation of the main provider' s cadre had high overall validity, meaning AUC≥0.70 for high individual-level accuracy and 0.75<IF<1.25 for low population-level bias. During exit interviews, women' s recall had high overall validity for the presence of more than one provider at birth and high individual-level accuracy for the main provider' s cadre and the presence of a support person during labor and delivery. During follow-up, women' s recall for these three indicators met neither validity criteria.

Care for the woman
Health worker documentation in maternity registers had high overall validity for maternal age at delivery and prior parity and high individual-level accuracy for reporting the use of a partograph. While there was insufficient variation in responses for validation analysis, health worker documentation had near complete agreement with the gold standard for the administration of a prophylactic uterotonic.
During exit interviews, women' s recall on four provider respectful care indicators met at least one validity criteria, with high overall validity for two indicators: allowed to move and change positions during labor and allowed to have a support person during labor and delivery. During follow-up, women' s recall of being allowed to have a support person maintained low population-level bias only.
During exit interviews, women' s report of clinical care received had high overall validity for having her blood pressure taken before delivery and low population-level bias only for the administration of prophylactic uterotonic. During follow-up, only administration of a prophylactic uterotonic was able to maintain the low population-level bias.

Care for the newborn
For two indicators requiring the mother' s involvement, immediate initiation of breastfeeding and placing the newborn skin-to-skin, women' s recall during exit interviews had high overall validity. During follow-up, women' s recall of her baby being placed skin-to-skin maintained high overall validity, whereas recall of immediate breastfeeding met neither validity criteria. Health worker documentation of these practices as a composite indicator of essential newborn care met neither validity criteria; health workers documented a 95% prevalence for being kept warm and initiation of breastfeeding within 30 minutes of birth whereas birth observations documented 39% prevalence for these practices within one hour of birth.  (3) CHEW -community health extension worker *Distribution of characteristics based on the 1774 respondents during exit interviews. Percentages do not always add up to 100% due to rounding, missing responses (up to 1.1%), and "don't know" responses (0.2%). †"Age of client at delivery" had 1 (0.1%) missing response and 4 (0.2%) "don't know" responses. ‡"Prior parity" had 6 (0.3%) missing responses. §"Time of delivery" had 19 (1.1%) missing responses. ‖"Day of delivery" had 13 (0.7%) missing responses. ¶"Main provider during labor and delivery" had 1 (0.1%) missing response. Figure 2. Summary of childbirth care indicator validity criteria across data recording methods. Observed women were interviewed before discharge from the facility (exit interview) and at home nine to 22 months after childbirth (follow-up interview). Health workers documented childbirth events in facility maternity registers. AUC = area under the receiver operating characteristic curve; IF = inflation factor; >5%dk = >5% "don't know" responses; <5/cell = less than 5 observations per cell in two-by-two table validating data recording method against gold standard; AUC criteria for high individual-level reporting accuracy: AUC≥0.7; IF criteria for low population-level bias: 0.75<IF<1.25. *In the facility maternity register, essential newborn care is a composite indicator for (i) immediate initiation of breastfeeding and (ii) baby kept warm.  §For validation analyses, the following indicators were converted into binary variables: age at delivery (adolescent births); prior parity (prior parity, four or more births); and baby' s birthweight (low birthweight, <2500 g).

VIEWPOINTS PAPERS
‖In the facility maternity register, essential newborn care is a composite indicator for (i) immediate initiation of breastfeeding and (ii) baby kept warm.
¶For these indicators with >5 observations per cell, validation analyses were not conducted due to >5% "don't know" responses.

VIEWPOINTS PAPERS
For additional immediate newborn care indicators assessed, women' s recall during exit interviews had high overall validity for immediate drying of the newborn and the application of chlorhexidine on the newborn' s cord. Women' s recall of whether she and her newborn were kept in the same room after delivery nearly met the criteria for high overall validity, AUC = 0.69 (95% confidence interval (CI) = 0.61-0.77) and IF = 1.00. For whether the baby was weighed at birth, health worker documentation met criteria for low population-level bias.
For indicators related to low prevalence newborn outcomes, health worker documentation met high overall validity for whether a baby was stillborn and high individual-level accuracy for whether a newborn had low birthweight.

DISCUSSION
Providing high quality facility-based childbirth care with a skilled provider is essential for improving the health and survival of women and newborns. Accurate information on the care received is essential to monitoring progress. In Gombe state, where women predominantly seek childbirth care in rural primary health facilities, our study suggests that health worker documentation in facility registers, facility-level exit interviews, and household-level follow-up interviews can all contribute to accurate monitoring, but no individual method provided a broad understanding of the provision and experience of childbirth care.
Our validation of health worker documentation against a gold standard of birth observations differed from other accuracy studies of facility-based data. To date, studies assessed the extent to which data sources agreed when aggregated, reflecting the critical capacity to tally and report consistently between levels of the health system. Focusing on individual-level validity, health worker documentation had high validity (AUC≥0.70 and/or 0.75<IF<1. 25) for select indicators about the main provider, maternal background characteristics, and newborn outcomes. Unsurprisingly, health workers were well-positioned to determine the provider' s cadre and newborn outcomes such as stillbirths. Maternal background characteristics were also relatively stable data which could be verified during the antenatal period.
However, health worker documentation did not meet any validity criteria for essential newborn care, a composite indicator of immediate breastfeeding and keeping the baby warm. As noted earlier, the prevalence for essential newborn care within 30 minutes of birth documented by the health worker was 95% (95% CI = 90%-97%), whereas the observed prevalence for immediate breastfeeding and placing the newborn skin-to-skin within one hour of birth was only 39% (95% CI = 26%-53%); health workers markedly overestimated the prevalence. Given the complexity of the essential newborn care definition, this may reflect the format of the documentation source which did not distinguish between care elements, as well as potential differences in interpretation between the observer and the health worker.
Our study adds new evidence to the validity of women' s self-reports at different recall periods and focused on women who delivered in rural primary health facilities. We found that exit interviews had high validity for four immediate newborn care practices: drying the newborn with a towel; placing the newborn skin-to-skin; immediate breastfeeding; and applying chlorhexidine to a newborn' s cord. In contrast to our study, two validation studies using hospital exit interviews in Mexico and Kenya did not report high validity for immediate drying of the newborn, placing the newborn skin-to-skin, and immediate breastfeeding [18,19]. Facility environment may explain part of the differences observed, which may in turn influence the frequency of "don't know" responses or the low specificity from a positive facility reporting bias [18,19]. For example, in our study, the practice of placing the newborn with the mother immediately after birth was 97%, compared to 10% in Mexico and 58% in Kenya.
Similar to other validation studies, we found that women' s self-reports during follow-up nine to 22 months after childbirth had low validity across indicators assessed. Placing a newborn skin-to-skin immediately after birth was the one exception, consistent with a follow-up study in Mozambique which included a nation-wide sample of rural and urban health facilities, but inconsistent with the Kenyan study [16,20]. One possible explanation for this being a memorable event for northeastern Nigerian women may be that the practice of immediate skin-to-skin contrasts with longstanding cultural beliefs on early bathing of newborns and the negative perceptions of vernix [43,44].
Indicators that met criteria for low population-level bias only, such as the administration of prophylactic uterotonic (exit, follow-up), permission to drink and eat during labor (exit), and baby weighed at birth (maternity register) had high prevalence, which masked a high false positive rate among the small number of clients that did not receive the service. Thus, we recommend caution when interpreting these indicators and triangulation with other data sources.

VIEWPOINTS PAPERS
Our findings highlight the importance of expanding the sources of data for monitoring the content of childbirth care. In addition to standard household surveys, monitoring of facility-based childbirth care should consider drawing from and linking multiple data sources including routine health facility data and exit interviews with recently delivered women. Facility-based routine data, such as registers, and exit interviews are useful sources for determining an accurate numerator when monitoring facility-based care; linkages to population-level data are still critical to determine the denominators for population in need and underserved subgroups [13]. At a global level, as greater emphasis is placed on respectful maternity care and the clients' experience of care, exit interviews are being included in the monitoring frameworks for assessing the quality of facility-based care [45]. Further, recent calls for greater investment in routine health information systems, if successful, would allow for monitoring beyond the global-and national-levels, as routine data are available at a greater level of disaggregation and frequency [13,46,47].
The limitations of exit interviews and routine data still need careful consideration, however. Facility registers capture limited information about service delivery and, hence, provide a narrower but more frequent picture of quality of care. Health worker documentation and exit interviews are susceptible to reporting biases, whereby health workers record information only for the services they provide and women report receiving an intervention because of social desirability bias or a higher quality of care that might be assumed with a facility delivery [17,18].
Among the strengths of this study was the use of birth observations as the gold standard which was compared to facility exit interviews, household follow-up interviews, and health worker documentation in maternity registers. The longitudinal study design allowed us to assess the validity of women' s self-reports for different recall periods: before discharge from a facility and at nine to 22 months after childbirth, which more closely reflects the recall period and interviewing conditions of household surveys. Further, this study was novel as this setting was predominantly rural, based in the primary health care context, and included validation of health worker documentation in maternity registers. Among the limitations of the study, our findings primarily reflect the reporting accuracy of women who seek facility-based care. Further, women participating in household surveys are not usually interviewed twice; however, individual-level reporting accuracy decreased in our study which is different from what we would expect for repeated measurements. The gold standard could be susceptible to error from incorrect observer interpretation, errors in data recording, or changing behaviour because of the Hawthorne effect, even in the presence of quality control mechanisms [48]. Even with pre-testing, the questions in the exit and follow-up interviews may not have been interpreted as intended. Further, some observed indicators had such high or low coverage and were unsuitable for validation analyses. Finally, while not strictly a limitation, relatively stringent cut-off criteria were chosen for AUC and IF to align with other studies [23].

CONCLUSION
The childbirth process presents a time of great risk of death for women and newborns. Health worker documentation, facility-level exit interviews, and household-level follow-up interviews with women after childbirth each have a role to play in the accurate monitoring of facility-based childbirth care to improve the health and survival of women and their newborns.