Evaluating consistency of recall of maternal and newborn care complications and intervention coverage using PMA panel data in SNNPR, Ethiopia

Background There is recognition that effective interventions are available to prevent neonatal and maternal deaths but providing reliable and valid coverage estimates remains a challenge. Household surveys rely on recall of self-reported events that may span up to 5 years, raising concerns of recall bias. Objective This study assessed the reliability of maternal recall of pregnancy, delivery, and postpartum events over a six-month period and identified relevant individual characteristics associated with inconsistent reporting. Methodology A longitudinal household survey was conducted with 321 pregnant women in 44 enumeration areas in Southern Nationals, Nationalities and People’s Region in Ethiopia. Women who were six or more months pregnant were enrolled and interviewed at seven days, six weeks, and six months post-partum using an identical set of questions regarding maternal and neonatal health and receipt of select neonatal care interventions. We compared responses given at 7 days to those reported at 6 weeks and 6 months and conducted sensitivity, specificity, area under receiving operative curve, and Kappa analyses of selected indicators. Results We find that reporting complications is higher at the first interview after birth than at either the six-week or six-month interview. The specificity of the majority of complications is high, however sensitivity is generally much lower. The sensitivity of reporting any complication during pregnancy, delivery, or post-partum ranged from 54.5% to 67.6% at the 6-week interview and from 39.2% to 63.2% at the 6-month interview. Though sensitivity of receipt of neonatal interventions was high, specificity and kappa demonstrate low consistency. Conclusion As with childbirth, it may be that during the first seven days women note symptoms with higher scrutiny, but if these do not later develop into serious health issues, they may be forgotten over time. Maternal complications and care are likely to be under-reported by women if interviewed for distant events.


Introduction
There is recognition that "we know what works" to prevent neonatal and maternal deaths and that proven, effective, and cost-effective interventions are available [1,[1][2][3][4][5][6][7][8][9][10][11][12][13][14][15][16][17][18][19][20]. Though health management information systems (HMIS) can be used to track coverage of some interventions, the majority of data on interventions received through community outreach or in the home rely on household surveys such as the DHS and MICS [21]. Due to this reliance on household surveys, a growing body of research in low-and middle-income countries (LMICs) is assessing the validity of self-report of receipt of interventions.
The majority of reproductive, maternal and newborn care health (RMNCH) validation studies assess the consistency of recall between facility based interventions and women's subsequent report and find that generally the ability of women to report accurately depends on the intervention itself, on the delivery experience, on characteristics of women, and on the time elapsed since receipt of services [21][22][23][24]. Recent research also suggests that the ability of women to recall and report pregnancy and delivery events is also influenced by the type of measures; concordance between maternal recall and obstetric records for continuous measures (such as gestational age and birthweight) may be better than dichotomous measures (such as the presence or absence of specific delivery events) [25].
To address sample size challenges, the majority of household surveys rely on women retrospectively recalling birth events over a two to five-year window. Evidence has been mixed on whether the recall period has an effect on the ability of women to report accurately [26,27]. While recent work by Moran and colleagues found that at the population level, reporting of some indicators did not seem to be affected by time since birth, they were not able to assess whether individual responses were consistent over time due to the cross-sectional nature of household surveys [28]. Stanton and colleagues found that recall within a twelve-month period was problematic, while Stewart and colleagues found no relationship between the recall period and the validity of women's report. None of the studies, however, reviewed consistency of reporting over time and whether consistency of reporting varied by time, indicator, or background characteristics of the respondent.

Country context
In recent years, Ethiopia has put forth substantial effort to address maternal, neonatal and under-5 mortality, but maternal and newborn health service utilization remains low. In Southern Nations Nationalities and Peoples Region (SNNPR), the site of this research, approximately 30.4% of women who had a birth in the five years before the survey received no antenatal care, 73% delivered in their home and 19% received any postnatal care [29].
Utilizing a longitudinal study design, the Performance Monitoring Accountability 2020 Maternal and Newborn Health (PMA-MNH) study was conducted in Southern Nations Nationalities and Peoples Region (SNNPR) of Ethiopia to monitor the use of proven interventions to reduce maternal and neonatal mortality. A battery of questions were repeated at each interview to assess consistency in maternal recall of events surrounding the pregnancy, delivery, and post-partum period. Using these data, we intend to: assess the reliability of maternal recall of pregnancy, delivery, and postpartum experience of complications through self-report of symptoms; assess the reliability of maternal recall reporting neonatal care received in the immediate postpartum period; assess the reliability of maternal recall reporting neonatal illnesses experienced in the first seven days of life; and identify relevant individual characteristics associated with consistent, and conversely, inconsistent recall of events

Study design
The study design was a longitudinal household survey to collect knowledge, practice, and coverage information of maternal and newborn care interventions. The study utilized the existing survey platform of PMA2020, which has been implemented in Ethiopia since 2013. PMA2020 is a survey platform, operational in 11 countries, that trains and employs local women to serve as enumerators, conducting face-to-face interviews on smartphones to generate rapid turnaround data [30]. In Ethiopia, PMA2020 is a collaboration between Johns Hopkins Bloomberg School of Public Health (JHSPH), Addis Ababa University (AAU), the Federal Ministry of Health (FMoH) and the Ethiopia Public Health Association (EPHA).
The enumeration areas for PMA2020 were selected using a two-stage stratified cluster sampling design and selected with probability proportional to size within urban and rural strata. At the time of the launch of the PMA-MNH survey, PMA2020 had conducted four rounds of data collection in 47 enumeration areas (EAs) in SNNPR. After considering the logistical challenges of conducting longitudinal data collection, three enumeration areas were dropped, resulting in a total of 44 enumeration areas participating in the initial screening of households for PMA-MNH. One enumeration area had no reported women who were 6-9 months pregnant. This was verified by the supervisor in the field and was subsequently dropped from the longitudinal follow-up. The final 43 EAs were included for PMA-MNH.

Sampling strategy
The study team first conducted a complete census of the 44 EAs and conducted a brief household survey. All household members were enumerated and all women between the ages of 15-49 were screened. Women who were six or more months pregnant, by self-report of duration of pregnancy, were eligible to participate in the longitudinal study. A household and individual questionnaire were completed at the time of enrollment. During the postpartum period, resident enumerators (REs) returned to administer questionnaires in-person at seven days and six weeks postpartum, and either called or returned in person at six months postpartum to administer the final questionnaire. REs obtained the contact information of women during enrollment and maintained frequent communication during the study period, particularly during the anticipated delivery date. A set of questions regarding maternal symptoms experienced during and after pregnancy, neonatal symptoms experienced during the first seven days of life, and receipt of select neonatal care interventions was asked at each interview to evaluate consistency in reporting. Women were not asked to report on specific diagnoses they may have received either for themselves or their infants, only on specific symptoms that they experienced.

Ethical clearance
Ethical approval for this study was granted by the Johns Hopkins Bloomberg School of Public Health and the Ethiopian Public Health Institute (EPHI) Institutional Review Boards. Verbal consent was obtained from participants. The IRB approves verbal consent procedures (without a need for written consent) for simple surveys without any invasive procedures in an environment where literacy is low. Detailed information about the ethical guidelines required by EPHI can be found in the National Research Ethics Review Guidelines (http://www.ccghr.ca/ wp-content/uploads/2013/). Table 1 shows the number of women and the number of live infants for which information was collected at each interview. Overall, response rates were high; 97.6% of all women originally enrolled completed the final interview.

Analysis
We evaluated the extent of recall bias in the reporting of maternal and neonatal complications and care received by testing the sensitivity, specificity and area under receiving operative curve (ROC) of selected indicators. We treated the seven-day interview as the standard and Table 1. Results of household, female and MNH screening 7-day, 6-week and 6-month postpartum interviews (unweighted).

Household interviews
Total SNNP compared the consistency in responses at the six-week and six-month interviews. Comparisons of responses to identical questions at each survey round were performed for women who participated in both the seven-day interview and the respective follow-up interview; thus, the six-week interview has 322 responses and the six-month has 321. Sensitivity shows the reported positive responses at the seventh day postpartum interview that were correctly recalled as positive at 2 nd and 3 rd follow-up visits on 6-week and 6-month interviews while specificity shows the negative responses at first interview correctly reported as negative at follow-up interviews. A convenient way to summarize these test measures is expressed in the area under the ROC. An area of 1.0 represents a perfect recall (test of matching responses) and an ROC area of 0.5 represents an unreliable response. In addition, we used two other measures of assessing agreement: Cohen's kappa coefficient and agreement in diagonal cells. Kappa is considered more robust than the agreement measurement. Finally, we assessed whether specific respondent characteristics (age, residence (urban/ rural), education, parity, facility delivery, receipt of caesarean section, and a visit from an HEW in the first seven days postpartum) were associated with inconsistent reporting of any complications or illnesses and for three high impact interventions (whether the baby was wrapped, whether breastfeeding was initiated within one hour of delivery, and whether the baby was placed naked on the mother's chest immediately after delivery). A binary variable for each event (either reported complication or receipt of intervention) was created to indicate whether the respondent reported consistently across all three interviews or was inconsistent in one interview and a multivariable logistic regression analysis run to assess if any sociodemographic variables were associated with inconsistent reporting. Variables were categorized as follows; age into three age groups (15-24, 25-34, 35-49), education into three groups (no school, any primary, and any secondary or above), parity into two groups (first birth versus higher order), and home delivery versus any facility delivery.
Survey weights were not applied in this analysis except to demonstrate representativeness of the sample characteristics when adjusted for the complex survey design. All analyses were conducted using Stata 15.1.

Results
Background characteristics of women collected during the initial screening and first follow-up interview at 7-day postparum are summarized in Table 2. The majority of respondents were married (97.1%), between the ages of 25 and 34 (51.8%) and had given birth to at least four children (54.0%). Table 3 below shows the numbers of women reporting any maternal complications during pregnancy, delivery, and immediately post-partum. Only those women who reported a complication were asked if they sought treatment, thus among women who reported a complication during only one interview, their report of care-seeking is missing in the comparison interview. This restriction significantly reduced the number of women who contributed information for whether treatment was received, particularly for post-partum complications.

Maternal complications
Complications during pregnancy. As shown in Table 4 and Table 5, more women reported experiencing complications during pregnancy at the seven-day interview than at the six-week interview and six-month interview for all complications. Specific complications are arranged in order of lowest to highest sensitivity. While 52.8% of women reported at least one pregnancy complication at the seven-day interview, only 41.0% reported any during the sixweek follow-up interview and 38% reported any during the six-month interview.
At the six-week interview, when asked to report specific complications, report of vaginal bleeding during pregnancy had the lowest indicators of agreement while edema had the highest. At the six-month interview, all complications had lower sensitivity, ROC, and kappa values than at the six-week interview, with the exception of vaginal bleeding, although none are statistically significantly different. At the six-month follow-up, convulsions had the lowest sensitivity, ROC, and kappa statistics, while edema remained the most consistently reported with the Table 3. Frequency of reporting of any maternal complications during frequency, delivery, and post-partum between 7-day and 6-week and 6-month interview. highest sensitivity, ROC and kappa statistics. Sensitivity, specificity, ROC, agreement and kappa were comparable at the six-week and six-month interview for the report of any complication during pregnancy. The statistics for the six-month interview are all marginally, though not statistically significantly, lower. The exception to this is a statistically significant reduction in the kappa statistic for reporting of fever and convulsion between the two interviews. Complications during delivery. Fewer women reported complications during delivery than complications during pregnancy in all interviews. Overall, 38.8% of women reported at least one delivery complication during the seven-day interview. At the six-week visit, only 26.2% of women reported a delivery complication and this dropped to 24.9% at the six-month visit. At the six-week interview, sensitivity for all delivery complications was low, ranging from a low of 14.3% (95% CI: (0.4-57.9)) among women reporting a leaking membrane for more than 24 hours to a high of 55.4 (95% CI: 42.5-67.7) among women reporting prolonged labor. While the sensitivity of women who reported receiving treatment for a complication was 100% in both the six-week and six-month interview, the specificity was lower, with more women reporting that they received treatment in each of the follow-up visits than at the initial interview.

6-week
Post-partum complications. Approximately 30% of women reported experiencing at least one post-partum complication at the seven-day visit,. At the six-week interview, 25.1% of women reported a post-partum complication and by the six-month interview, this fell to 20.5%. Fever was the most commonly reported complication in all interviews. At the six-week interview, the sensitivity ranged from 43.9% (95% CI: 28.5-60.3) to 52.9% (95% CI: 27.8-77.0) for postpartum hemorrhage and retained placenta, respectively, while specificity was above 90% for all complications. The ROC curve was approximately 70% for all three post-partum complications and agreement ranged from 85.0 to 96.2. Though there were no statistically significant differences in sensitivity and specificity measurement between the six-week and sixmonth interviews, all of the kappa values were statistically significantly lower, and by the sixmonth interview, all kappas were less than .40, indicating poor agreement.

Neonatal care and symptoms
Tables 6 and 7 below shows the numbers of neonates for whom a mother reported receiving neonatal care and any neonatal illness symptoms that occurred between birth and the first follow-up interview and the associated measures of consistency.

Immediate neonatal care
As shown in Table 6 and Table 7, indicators of agreement remained relatively constant over the six-month period. Slightly more women at the follow-up interviews than the initial interview reported that the baby was immediately placed on the mother's chest (48.9% at the 7-day, 51.1% at the 6-week, and 56.9% at the 6-month). While approximately equal percentages of women reported that breastfeeding started within an hour at each interview (83.2.0% at the 7-day, 84.0% at the 6-week, and 85.8% at the 6-month), the specificity was low at both the 6-week and 6-month interview (64.8% and 51.9%, respectively). Of the three immediate neonatal care Evaluating recall bias in maternal and newborn health reporting in SNNPR, Ethiopia indicators, reporting whether the neonate was wrapped within five minutes of delivery was the least consistent. The specificity, ROC, and agreement all showed modest declines between the 6-week and 6-month interview and the Kappa value declined from 0.39 to 0.29. Overall, report of specific neonatal illness symptoms experienced during the first seven days was low across all interviews, with cold being the most common illness reported in all three interviews. Women reported more symptoms at the seven-day and six-week interviews than in the six month interview. Pus or redness in the umbilicus, eye infection, inability to pass urine, and difficulty breathing were both reported at the seven-day and six-week interview, but not at the six-month interview, while hypothermia was reported during the sevenday and six-month interview, but not during the six-week interview. Slightly more women reported that their infant had an illness symptom at the first interview (25.3%) than the second (18.9%) and third (20.4%). While the overall specificity was relatively high for reporting no illness, the sensitivity was low in both follow-up interviews.

Characteristics associated with discordant reporting over time
Characteristics of the respondents were more strongly associated with inconsistent reporting of newborn health over time than maternal health (Table 8). Women age 35 and above had Evaluating recall bias in maternal and newborn health reporting in SNNPR, Ethiopia twice the odds of inconsistently reporting that they experienced a post-partum complication relative to women age 15-24, although the value is only marginally significant (OR: 2.03, p < .10). Primiparous women, women age 25-34 and women who received a post-natal care visit from an HEW within the first seven days had higher odds of inconsistently reporting that their infant was sick within the first seven days of life than their counterparts. In the bivariate associations, the odds of inconsistently reporting whether the baby was placed naked on the mother's chest were 2.5 times higher among women who delivered in a health facility compared to those who delivered at home (p < .05). Women who had a caesarean delivery had twice the odds of inconsistently reporting that the baby was placed naked on the chest compared to women who did not, although this only marginally significant (OR: 2.00, p < .10). Similarly, the odds of inconsistently reporting that the baby was wrapped was higher among facility deliveries relative to home deliveries (OR: 1.68, p < .05).
Once adjusted, most, though not all of these relationships, were no longer significant (Table 9). Age was associated with inconsistently reporting postpartum complications and women who saw an HEW for post-natal care in the first seven days had 3.29 times the odds of inconsistently reporting that their child was ill in the first seven days than women who did not see an HEW (p < .01). After adjustment, women who delivered in a facility had higher odds of Evaluating recall bias in maternal and newborn health reporting in SNNPR, Ethiopia inconsistently reporting whether the baby was placed naked on the chest compared to women who delivered at home (aOR 2.07, p < .10).

Discussion
This study assessed the consistency of women's self-report of pregnancy, delivery, and postpartum complications and the consistency of maternal report of neonatal health complications over a six-month period. We note that women may have trouble identifying symptoms accurately and our estimates should not be taken as population level estimates of complications.
Our intention was to assess if women consistently reported the experience of specific symptoms over time. Overall, we find that more women report that they experienced complications at the first interview after birth than at either the six-week or six-month interview. The specificity of the majority of complications is high, meaning that women are not likely to report a complication in later interviews that they did not report initially, however sensitivity is generally much lower and remained approximately constant over time. This indicates that while recall does decline over time, it does not seem that it declines rapidly after birth and then remains consistent once no further complications develop. A higher percentage of women reported having received treatment in the first seven days at the follow-up interviews compared to the initial seven-day interview. Overall agreement is acceptable, but it is important to note that the samples are not identical in each interview. We could only evaluate agreement for whether treatment was sought amongst the sample of women who consistently reported that they experienced a complication in both rounds. Report of care-seeking and receipt of treatment during pregnancy and childbirth, particularly if derived from questions that rely on skip patterns regarding experience of complications, thus may not be accurate for distant events, as both showed inconsistencies even within a sixmonth time period. Asking whether any care was sought during delivery and in the immediate postpartum period, regardless of whether any complications were reported, may yield a more accurate estimate of care-seeking. Few newborn illness symptoms were reported in the first seven days of life and due to small sample sizes, it limited our ability to assess the consistency of recall with precision. When aggregated to a measure of whether any symptoms at all were experienced in the first seven days of life, sensitivity was low. It may be that during the first seven days women note symptoms with higher vigilance, but if these do not later develop into serious health issues, they may be forgotten over time. Additionally, reporting on symptoms that occur within a specific time frame (e.g. within five minutes, within one hour) may become increasingly challenging to report accurately, as shown by the decline in specificity over time. This is in keeping with other studies that demonstrate low validity when reporting time bound indicators [23,31,32]. The measure of skin-to-skin contact performed the best of the three neonatal indicators, in keeping with Stanton and colleagues [23] findings that this indicator was among the most valid when self-report was compared to facility observation.
Though others studies have found that accurate reporting of complications is associated with age and parity, we found few statistically significant relationships between sociodemographic characteristics and consistent recall of maternal events [26]. While place of delivery does appear to influence whether women report complications consistently, our sample size was not large enough to confirm that these relationships were statistically significant. The literature on recall surrounding pregnancy events has relied heavily on verification using facilitybased records and generally excludes women who delivered in the home. This study showed that there was a non-significant difference in recall consistency for drying and placing the baby on the chest, between women who delivered at home and women who delivered in the facility. Based on previous studies, it may be better to continue to rely on facility records for coverage of these interventions in the facility, but true population coverage rates should include these indicators in surveys for women who delivered at home. In countries with low facility delivery, such as Ethiopia, failing to include women who delivered at home may introduce a substantial degree of selection bias.
Our study is not without limitation, primarily the small sample size which resulted in few cases of several symptoms being reported, leading to large confidence intervals around several estimates. Though we attempted to address this by aggregating into whether the woman reported any or no symptoms, in doing so, we lost detail. The small sample size also limited our ability to identify any statistically significant relationships. Our study has a number of strengths, however, first among them being the longitudinal design with low loss to follow-up. Less than 5% of the initial sample was lost during the study period, and those that were did not differ substantially in background characteristics from women who were retained. Secondly, we included all women, including women who delivered at home, which reduces the selection bias associated with only sampling women who delivered in health facilities.

Conclusions
We find that for the majority of maternal and newborn complications surrounding birth, sensitivity of self-report within six months is low, meaning women may either over-report complications immediately after birth or under-report at later points if no further complications arise. Questionnaires designed to assess care-seeking during the peripartum period should not rely on skip-patterns based on self-report of complications as these estimates may be subject to recall bias. Additionally, questions that depend on recall within time-bound periods, such as within an hour from birth or within the first seven days of life, are subject to recall bias even within short time periods. More work is needed in instrument development to improve questions that relate to very specific time periods. Much of the data around the coverage of interventions is generated from population-based surveys and our results identify an important limitation with currently fielded questions. The results of this study have important implications for the field and highlight areas where additional work in measurement is needed. Studies using longitudinal data for individual recall should be conducted in other settings to determine the replicability of our findings.