Measuring clients’ experiences with antenatal care before or after childbirth: it matters

Background When clients’ experiences with maternity care are measured for quality improvement, surveys are administered once, usually six weeks or more after childbirth. Most surveys conveniently cover pregnancy, childbirth and postnatal care all in one. However, the validity of measuring the experiences during pregnancy (antenatal experiences) after childbirth is unknown. We explored the relation between the measurement of antenatal experiences late in pregnancy but prior to childbirth (‘test’ or gold standard) and its retrospective measurement after childbirth (retrospective test). Additionally, we explored the role of modifying determinants that explained the gap between these two measurements. Methods and Findings Client’s experiences were measured by the ReproQuestionnaire that consists of an antenatal and postnatal version, and covers the eight WHO Responsiveness domains. 462 clients responded to the antenatal and postnatal questionnaire, and additionally filled out the repeated survey on antenatal experiences after childbirth. First, we determined the association between the test and retrospective test using three scoring models: mean score, equal or above the median score and having a negative experience. The association was moderate for having any negative experience (absolute agreement = 68%), for the median (absolute agreement = 69%) and for the mean score (ICC = 0.59). Multiple linear and logistic regression analysis for all three scoring models revealed systematic modifiers. The gap between antenatal and postnatal measurement was (partly) associated with clients’ experiences during childbirth and postnatal care and by professional discontinuity during childbirth but unrelated to the perceived health outcome. Conclusions The antenatal experiences should be measured before and not after childbirth, as the association between the antenatal experiences measured before and after childbirth is moderate.


INTRODUCTION
Clients' experiences with care are considered to be an important independent indicator of health care performance (Valentine et al., 2003;Valentine, Bonsel & Murray, 2007). Being relevant for its own sake, clients' experiences could also affect health outcome through several pathways (Campbell, Roland & Buetow, 2000;Sitzia & Wood, 1997;Wensing et al., 1998;Williams, 1994). For example, clients who truly understand the explanation of their caregiver are more likely to comply to treatment or lifestyle change.
As clients' experiences are an independent indicator of performance, clients' experiences are systematically measured using surveys, usually held after the care-episode. Such measurements could help to identify areas for improvement (Haugum et al., 2014;Weinick et al., 2014). Targets of quality improvement are found by identifying health care organizations or areas with below average scores or single negative outliers on questions representing the characteristics of service delivery, e.g., communication and prompt access to services. Next, the organization develops and implements a plan to meet these goals, and verifies if the goals are met (UK Department of Health, 2010;Ellis, 2006;Ettorchi-Tardy, Levif & Michel, 2012;Kay, 2007).
Clients' experiences in maternity care are routinely measured in several countries. Data on clients' experiences are usually collected through surveys administered six weeks or more after childbirth. Most surveys cover pregnancy, childbirth and postnatal care in one measurement (Dzakpasu et al., 2008;Hay, 2010;Redshaw & Heikkila, 2010;Van Wagtendonk, Hoek & Wiegers, 2010;Wiegers et al., 1996). As these surveys cover almost about 9 months of care, with different health care professionals, settings and possibly events, measurement of client's experiences bears the risk of being vulnerable to memory failure and/or changes in perception due to modifying intercurrent events that happened since the antenatal experiences. Assuming the antenatal measurement of such experiences to be the gold standard, the question is whether the response on the postnatal survey shows random and/or systematic error. Stated otherwise, when the clients' experiences are measured before childbirth and repeated after childbirth, does this lead to the same clients' experience scores? Ideally, valid measurement of antenatal experiences postnatally should not be systematically affected by the care process, experiences or outcomes that occur after antenatal measurement. Despite the widespread practice of a one-stage postnatal measurement, to our knowledge this question has never been explored. If random error is considerable or systematic shifts are present, the convenient one-stage measurement perhaps should be replaced by a two-stage measurement procedure, that includes the measurement of clients' experiences not only after childbirth but also antenatally.
We explored the presence of memory effects in the measurement of clients' experiences in maternity care using the ReproQuestionnaire (ReproQ). ReproQ is the national survey for client experience measurement in childbirth care in the Netherlands. It was especially designed for a two-stage measurement procedure, consisting of antenatal and postnatal versions. ReproQ was extensively validated (n > 18,000) (Scheerhagen et al., 2015;Scheerhagen et al., 2016) and is currently regarded as one of the national maternity care indicators (CPZ, 2015).

ReproQuestionnaire
The ReproQ consists of two versions, each covering the experiences of two reference periods. The antenatal version covers the experiences during early and late pregnancy; the postnatal version covers the experiences during childbirth and postnatal care. Both versions are identical, in the sense that the same type of experiences is asked for, but items (questions) are contextually adapted. Altogether, a client is invited to judge a typical item for four consecutive periods.
The conceptual basis of the ReproQ was the WHO responsiveness model (Valentine et al., 2003;Valentine, Bonsel & Murray, 2007). The WHO developed this universally applicable concept that consists of four domains on the interactions of the client with the health professional (dignity, autonomy, confidentiality, and communication), and of four domains on the client orientation of the organizational setting (prompt attention, access to family and community support, quality of basic amenities, and choice and continuity of care) (Valentine et al., 2003;Valentine, Bonsel & Murray, 2007). The response mode of all the experience items uniformly consists of four categories: ''never'', ''sometimes'', ''often'', and ''always'', with a numerical range of 1 (worst) to 4 (best).
Additional sections of the ReproQ address the client's socio-demographic characteristics, details about the care process during pregnancy and childbirth, and maternal and infant health outcomes in non-medical terms as perceived by the mother. We also added a relevance question on which two out of eight domains were most important to the client.
Previous psychometric analyses showed that content and construct validity were good, as was the test-retest reliability of the experience during childbirth. Full details of the development and the psychometric properties of the questionnaire are described elsewhere (Scheerhagen et al., 2015;Scheerhagen et al., 2016).

Design, ReproQ scoring models, outcomes
The Medical Ethical Review Board, Erasmus Medical Center, Rotterdam, the Netherlands, approved the study protocol (study number MEC-2013-455).
The study was designed as a cohort study with three measurements. First, women received an invitation to fill out the antenatal ReproQ around a gestational age of 34 weeks. This is called 'test'. Second, women received an invitation to fill out the postnatal ReproQ six weeks after the expected date of childbirth. Non-responding women received a reminder two weeks after invitation to the antenatal and postnatal questionnaire. Third, we invited women who responded to the antenatal and postnatal ReproQ again to fill out the antenatal experiences after childbirth. This is called the 'retrospective test'. We sent the retrospective test at least 14 days after women filled out the postnatal ReproQ.
Three different scoring models exist to summarize clients' experiences and to monitor adverse outcomes at the individual or aggregate level. The three models may be applied to an individual item, to an individual domain (called domain score), to two summary scores of the four personal and four setting domains (called personal and setting score), or to a summary score of all domains (called total score).  Table 1 displays the scoring models and their definitions. The first model creates a dichotomous variable (called 'negative score') at the client level, reflecting the presence of any so-called negative experience. As Table 1 shows, the definition of a 'negative' experience is based in part on the two domains that a client identifies as most important, thereby creating a personalized score. Since the likelihood of a negative experience partially depends on the number of items per domain, absolute percentages of negative scores cannot be compared across domains. The negative score model assumes that, for the individual client or for an organisation, a negative experience cannot be compensated by very good experiences on other items or domains. This is contrary to the mean score where good experiences can compensate poor experiences. The second scoring model computes a continuous mean score (called 'mean score', range 1.0-4.0) at the client level, for each domain or group of domains separately. The total, personal and setting summary scores are not the mean of all items involved in the domains, but the unweighted mean of the mean domain scores involved in that summary measure. For the calculation of the summary scores, each domain has the same weight, even if the domains rest on a different numbers of items.
Finally, the third model creates a dichotomous variable at the client level reflecting whether her mean item, domain or summary score is equal to/above or below the median of the distribution of the respective item, domain or summary scores of all cases (called 'median score'). The 'median score' model was added because of the skewed distributions of clients' experience scores.

Data collection
ReproQ data were obtained from two sources: 10 perinatal units (a hospital with its associated community midwife practices) and two maternity care organizations. These organizations deliver postnatal care at home from childbirth onwards over a period of seven to 10 days. Women can register and apply for this service during pregnancy. For perinatal units, clients were invited to participate by their caregiver, who asked for consent.
For maternity care organizations, all women were invited to fill out the client experience questionnaire, after consent was ticked.
Data were collected in two periods. In the first period (October 2013 to January 2015), data was collected with the antenatal ('test') and postnatal ReproQ. There were no restrictions to invite women to fill out the antenatal and postnatal ReproQ; all women could participate provided that informed consent was signed or ticked. The second period, December 2014, administered the data of the retrospective test. Women were excluded from participation of the retrospective test for the following reasons: (1) women did not respond to the antenatal and postnatal questionnaires, (2) women filled out less than 50% of the antenatal and/or postnatal experience score, or (3) they filled out the questionnaires on paper. (This was done for the reasons of data management efficiency; n = 166). Women were excluded from analyses if they filled out less than 50% of items of the retrospective test questionnaire, or if women filled out the retrospective test over 1.5 years after childbirth. The latter criterion excluded women who could be pregnant again.

Measures of agreement
In this study we used two dichotomous scores and one continuous score for the domain and summary scores, with two different agreement statistics. For the negative and median scores, we used the percentage absolute agreement (AA), classified as 'excellent' (90%-100%), 'good' (75%-89%), 'moderate' (60%-74%), or 'poor' (<60%) (Singh et al., 2011). For the mean score, we used the Intraclass Correlation Coefficient (ICC) as measure of agreement (two way mixed model, absolute agreement, single measure), and classified the estimated ICCs as: 'excellent' (≥.81), 'good' (.61 -.80), 'moderate' (.41-.60), 'poor' (≤.40) (Singh et al., 2011). For the individual items, agreement between the test and retrospective test was quantified as the percentage absolute agreement. Figure 1 shows the analytic framework. All analyses were performed on the reported experience of the second half of the pregnancy, because in psychometric analysis the experiences during first and second half of pregnancy are highly associated (AA Neg = 91.6%; AA MD = 85.9%; ICC = 0.83). The late antenatal experiences were chosen as comparator ('test' or gold standard), because the second half of pregnancy covers more antenatal check-ups than the first half, and therefore thought to be more representative for the entire antenatal phase. Moreover, the timespan between the second half the pregnancy and the retrospective test is smaller than the timespan between early pregnancy and the retrospective test, and therefore the risk of memory effects is probably smaller.

Data analysis
We used all retrospective test data collected up to 1.5 years after childbirth (range: 3.5 month to 1.5 years after childbirth). The wide range had limited impact on the experience scores of the retrospective test and the association between the test and retrospective test; both slightly decreased over time.
First we explored the crude agreement between the antenatal experiences measured before (test or gold standard) and after childbirth ('retrospective test'). For that purpose the three outcome measures were computed for a. the total score, b. the personal and setting summary scores, and c. the individual domain scores, and subsequently the agreement of the gold standard and retrospective test was calculated. The agreement of the individual items between the before (gold standard) and after childbirth (retrospective test) measurement was calculated. While the domain and summary measures were calculated conventionally, for the individual item analyses, we split the 'no-agreement' category into ''test better experience than retrospective test'' and ''test worse than retrospective test''. Second, we explored the effects of background characteristics and systematic effects of intercurrent events, as determinants of the antenatal total experience score as measured after childbirth. For the negative and median score models, we used multiple binary logistic regression analysis. For the continuous mean score model, we applied multiple linear regression analysis. Dependent variable was the antenatal total experience score as measured after childbirth; independent variables were the antenatal total experience score as measured before childbirth (gold standard score) and a set of potentially modifying factors. The following sets of determinants were included in the regression model (enter method): socio-demographic characteristics, previous experiences with care (antenatal, childbirth and postnatal care), characteristics of the care process during pregnancy and childbirth including interventions during childbirth, and perceived health outcomes of mother and child.
Considering the abundance of possible determinants and limited sample size, we included in the multivariable analyses only those that were determinants of clients' experiences during childbirth (M Scheerhagen, E Brinie, A Franx, HF Van Stel, GJ Bobsel, 2014-2015. A determinant was overall judged as significant if the estimated adjusted beta-or OR-coefficient was statistically significant (p < 0.05, two-sided) in at least two of these analyses, a conservative approach.
For the binary logistic regression analysis, the goodness of fit was assessed using the proportion of correct predictions. For linear regression we used the adjusted R 2 . Figure 2 shows the flow diagram. We invited 3,313 women for the retrospective test, of whom 1,091 women responded (33%). Of these, 629 women were excluded from analysis. The remaining 462 women were included. Table 2 presents the characteristics of the included women (n = 462). Mean age was 32 years (SD = 4.8). Half of the women gave childbirth for the first time. 26 (6%) women were of non-Western background; and 14 (3%) women reported to have a low educational level. 241 (52%) women reported not to know the health care professional who supervised their delivery. 70 (16%) women were referred to secondary care during their pregnancy and 144 (32%) were referred during parturition. 84 (18%) women reported that they felt unhealthy and that they were hospitalized after childbirth. Additionally, 59 women (13%) perceived their babies' health as unhealthy and reported that their babies were hospitalized. Table 3 shows the crude agreement between the antenatal experiences measured before and after childbirth for the summary and domain scores. For the total score, 35% of the women reported one or more negative experiences filling out the 'test', and 33% when filling out the retrospective test. The absolute test-retrospective test agreement (AA) of 'having a negative experience' was 67.5% (CI [63.0-71.8%]). The absolute test-retrospective test agreement (AA) of 'a score above the median' was 69.6% (CI [65.2-73.8%]). The ICC of the  total experience scores (mean test = 3.77; mean retrospective test = 3.69) was 0.59. The negative, median and mean score models all indicated a moderate association. The associations of the personal and setting scores were comparable for the negative and median score models, but the association for the mean personal score was weaker then for the mean setting score (ICC 0.49 vs. 0.59).

RESULTS
All individual domains showed a good to excellent association for having a negative experience. For the median and mean scores, all domain associations were moderate, except for Confidentiality, which had an ICC of 0.27, indicating a poor association.
The item analyses showed good to excellent associations for having a negative experience (see Table 4). For the median score, the associations varied from excellent to moderate, except for 'Influence on childbirth plan' (AA = 59.7%) which was poor. For the mean score, not only this item (AA = 56.6%) but also 'Waiting time for service' (AA = 57.7%) and 'Continuity of care provision when change of professional' (across disciplines) (AA = 55.2%), had a poor association. Table 4 also depicts the magnitude and direction of change between the before and after childbirth measurements. For the negative score, agreement was very high, indicating that scores were fairly stable between the test and retrospective test, with slightly more clients reporting negative scores at the test, the 'Birthplan' item being an exception. The median and mean scores showed more variability in scores between the test and retrospective test, with the overall trend of higher scores at the test. Table 5 shows the results of the regression analyses. The experience score of the retrospective test were not significantly influenced by any of the socio-demographic characteristics. However, the retrospective test score was significantly associated with the women's antenatal, childbirth and postnatal experiences. Of the care process determinants, only professional continuity was relevant. Finally, the perceived maternal and infant health outcome had no significant influence on the retrospective test. Despite the different analyses and scoring models, the goodness of fit was comparable for the three measures (70-73%).

DISCUSSION
To determine the optimal timing of the collection of data on clients' antenatal experiences, we assessed the association between the antenatal experiences measured before and after childbirth for the summary, domain and item scores. The total score showed a moderate association, irrespective of the scoring model used. For the domain scores, the associations varied with the scoring model selected, being overall excellent for the negative score, and moderate for the median and mean scores. For the domains, agreement was quite Table 3 The association between the late antenatal experiences measured during pregnancy and after childbirth, expressed as having a negative experience, below the median score and mean score (n = 462).     uniform within the scoring model used. Confidentiality was the only domain with a poor association for the mean score. For the individual items, associations were particularly low for 'Influence on your childbirth plan', 'Waiting time for service', and 'Continuity of care provision when change of professional (across disciplines)'. Overall, the measurement of antenatal experiences after childbirth results in elevated variability of experiences across clients, with the overall trend that scores after birth are somewhat lower than before childbirth. Additionally, the gap between antenatal and postnatal measurement is (partly) associated with clients' experiences during childbirth and postnatal care and by professional discontinuity during childbirth, but it is unrelated to the perceived health outcome.
One key result is that the antenatal experience score measured after childbirth was only moderately associated with the antenatal experiences measured before childbirth, irrespective of the scoring model applied. In contrast, the personal, setting, domain and item scores were stronger associated for having a negative experience than for the median and mean scores. One explanation for this is that a negative experience lingers better in one's memory than an equally moderate or good experience, as shown in decision and judgment theory (Kahneman & Tversky, 1979;Redelmeier & Kahneman, 1996;Redelmeier, Rozin & Kahneman, 1993). An alternative explanation is of a statistical nature: changes in experiences are less easy to capture using a dichotomous measure like the negative score, producing much more agreement between the test and the retrospective test. The same argument, however, does not apply to the dichotomous median score. For the negative score, the cut-off has a fixed definition and is therefore absolute. In contrast, the cut-off for the median score equals the median of the distribution of the summary and domain scores 'as observed', and is therefore a relative position. Furthermore, the odds of having a negative experience increases with the number of items, whereas the odds of having an experience score equal or above the median is independent from the number of items.
In the ideal situation, a strong association between the antenatal experiences measured before and after childbirth is expected and desired. Furthermore, valid measurement of antenatal experiences postnatally should not be systematically affected by the care process, experiences or outcomes that occur after antenatal measurement. However, our results strongly suggest the opposite: women's experiences with childbirth and postnatal care had a positive and systematic impact on the antenatal experiences measured postnatally. One possibility is that women's response scales changed after birth. It is well known from research on judgment and decision (Stiggelbout & De Vogel-Voogt, 2008) and response shift (Rapkin & Schwartz, 2004;Schwartz et al., 2007;Sprangers & Schwartz, 1999), that pre-treatment judgment scales may differ systematically from post-treatment scales with, in our case, childbirth as the so-called catalyst. A change of reference frame or internal standards of comparison might result in scale recalibration (Rapkin & Schwartz, 2004;Schwartz et al., 2007;Sprangers & Schwartz, 1999;Stiggelbout & De Vogel-Voogt, 2008). The change comparison process may be related not only to a change of status quo, but also to the change of women's affect and mood after childbirth (Stiggelbout & De Vogel-Voogt, 2008). Another possibility is that retrospective judgment of past experiences invokes the risk of memory errors. Recall bias, i.e., 'wrong' assessment post-hoc of a former outcome (Blome & Augustin, 2015), may have occurred under the influence of childbirth and/or postnatal events or experiences. Another form of memory error, so-called hindsight bias (i.e., the influence of outcome knowledge on memory reconstruction, increasing the predictability of the outcome) is less likely as (favorable) childbirth and postnatal experiences contributed positively to the gap between antenatal and postnatal measurement instead of bridging it (Fischhoff, 2003).
In the ideal situation, the gap between antenatal and postnatal measurement should be independent from the care process and intervention determinants. Overall, effect sizes of these variables were moderate to negligible and not significant. One exception to this is professional continuity during childbirth that was of significant impact on the antenatal experiences measured after childbirth. This is probably due, at least in part, to clients' expectations: a new professional during childbirth is never as well informed about a client's wishes and customs as her attending professional during pregnancy, and trust between the new health care professional and the client is lacking. Even though the antenatal health care professional could (and should) inform a client that a transfer during childbirth is possible, clients may not feel prepared for a change of professional.
Surprisingly, the perceived health outcome of mother and child had no impact on the antenatal experiences measured after childbirth. This is in contrast with literature, which suggests that, in retrospect, when women after childbirth recollect their antenatal experiences, these experiences could adapt in the direction of the (perceived) health outcome during childbirth; i.e., hindsight bias (Fischhoff, 2003;Pohl, Bender & Lachmann, 2002;Ruoss, 1997). One explanation is that hindsight bias did not occur in our case. Another explanation is that clients do not perceive a relationship between the health outcomes of childbirth and the experiences during pregnancy, as different services are provided, often by different health care professionals and often in different settings.
Another surprise is that none of the included socio-demographic determinants were significantly associated with the gap between the test and the retrospective test. This is contrary to the results of research on judgment and decision (Stiggelbout & De Vogel-Voogt, 2008) and response shift (Rapkin & Schwartz, 2004;Schwartz et al., 2007;Sprangers & Schwartz, 1999). Several explanations can be put forward. Firstly, contrary to Sprangers & Schwartz, a change of antenatal and postnatal scales (recalibration, with childbirth as the so-called catalyst) did not occur or the change was small or undetectable. Secondly, several studies suggest that the agreement between the test and retrospective test is similar between subgroups, even though the experiences are different (Britton, 2012;Quintana et al., 2006;Raleigh et al., 2010;M Scheerhagen, E Brinie, A Franx, HF Van Stel, GJ Bobsel, 2014-2015. Stated otherwise, the effect may have been cancelled within patients or even be unrelated to patient characteristics. Thirdly, the socio-demographic characteristics do not directly affect the experience scores but only exert an indirect effect, through influencing the clients' mechanisms to accommodate the change in her situation (here: childbirth) (Rapkin & Schwartz, 2004;Schwartz et al., 2007;Sprangers & Schwartz, 1999). Consequently, the impact of socio-demographics may already be incorporated in the impact of previous experiences. Fourthly, our sample was too small to detect any impact of socio-economic status and ethnicity on the antenatal experiences measured after childbirth. However, that argument did not apply for marital status, maternal age and parity, which are socio-demographic characteristics that did not qualify for the multivariable analyses. Finally, we may have omitted relevant variables, e.g., personality traits or affect and mood (Saposnik et al., 2016;Stiggelbout & De Vogel-Voogt, 2008).
Our study in maternity care is a specific case of a general problem-as such it provides a warning for similar studies. Measurement problems may occur when experiences with care are evaluated but adjacent care episodes are different in terms of disease course or severity or care provided (e.g., in terms of professionals involved, locations) and separated by a critical event which could serve as 'catalyst' (e.g., intervention, hospitalization, complication). A possible change of patient's pre-and post 'catalyst' response scales and the risk of memory errors when patient's experiences are measured afterwards may result in reduced validity and/or reliability of measurements. To avoid these risks, we recommend that patient experiences with care to be measured within its own care episode.

Strengths & limitations
One strength of this study is that, to our knowledge, this is the first study exploring the validity of clients' antenatal experiences measured after childbirth. Nevertheless, several limitations merit discussion. Firstly, women with a low educational level, non-Western women, women <24 years of age, and setting continuity (referral to secondary care) were slightly underrepresented compared to the national pregnancy population (PRN Foundation, 2013), despite considerable efforts to adapt the questionnaire and other measures taken to further the participation of these groups. Our results suggest, however, that these variables are all unrelated to the gap between the antenatal and postnatal measurements. Parity, induced labour, mode of delivery, and maternal and neonatal admission rates were comparable to the national average. National data on professional continuity are lacking, but data are comparable to one of our other studies (n = 3,479 women; M Scheerhagen, E Brinie, A Franx, HF Van Stel, GJ Bobsel, 2014-2015. Secondly, we did not register whether the clients' situation changed during the interval between test and retrospective test other than the events, experiences and perceptions during childbirth and postnatal care. It is possible that omitted variables could further modify the gap between test and retrospective test.

Conclusion
Clients' experiences during pregnancy, childbirth and postnatal care are often measured for quality improvement cycles. We recommend measuring the antenatal experiences in late pregnancy instead of after childbirth, as the agreement between the antenatal experiences measured before and after childbirth is overall moderate for the summary scores.
The gap between antenatal and postnatal measurement is (partly) associated with clients' experiences during childbirth and postnatal care and by professional discontinuity during childbirth. Furthermore, measuring the antenatal experiences during pregnancy is the golden standard from a psychometric point of view. From an efficiency point of view, one could also argue to measure the antenatal experiences after childbirth and adjust the data to meet the experiences of the golden standard.