A randomized controlled trial of a 5‐year marriage checkup booster session for a subsample of responder couples

Abstract This study examined maintenance and booster effects of a brief couple intervention, the Marriage Checkup (MC), across 5 years. A subsample of 63 couples who benefitted from two previous MCs (responder couples), were randomly assigned to a third MC or control. Before randomization (at 4‐years‐9‐months), the responder sample had maintained small to medium effects on two measures of relationship functioning. After randomization, we found no significant between‐group effects. Yet, within‐group analyses revealed that while control couples showed flat trajectories in all outcomes after the 4‐year‐9‐months baseline, couples receiving a third MC (at Year 5) reported small to medium improvements in three measures of relationship functioning and maintained follow‐up effect in one measure. Findings indicate that couples who initially improved from the MC can maintain some of their improvements over long periods. The potential of boosting such improvements with recurrent MCs is a relevant target for further investigation in larger samples.


INTRODUCTION
Couple distress and conflicts are common phenomena and have consistently been linked with reduced physical and mental health (e.g., Robles et al., 2014;Whisman, 2007). Couple therapy is effective (Roddy et al., 2020), yet most couples do not seek this type of help, or they delay help-seeking until problems have accumulated to a point where the efficacy of couple therapy is reduced (for review of this point see Stewart et al., 2016). Barriers for seeking professional help include social stigma, lack of money and time, privacy concerns, and lack of adequate services (e.g., Williamson et al., 2019). These barriers challenge the reach and timely dissemination of interventions crucial for preventing and treating couple distress.
The marriage checkup (MC; Cordova, 2014) is a brief, low-cost, and empirically supported couple intervention designed to lower the barriers for seeking professional help. Equivalent to other checkups (e.g., physical health), the MC provides regular contacts with a professional and is designed to bridge the gap between universal prevention and indicated treatment for couples . Promoting both "relationship health maintenance, early problem detection, and early intervention" p. 593) the MC has been found to attract couples across the continuum from happy to severely distressed and to reach couples who never previously have sought professional help for their relationship (Cordova et al., 2005;Morrill et al., 2011). Likewise, couples perceive the MC as more accessible and less intimidating than traditional therapy (Morrill et al., 2011). When imported to Denmark in 2016, the MC was adapted to private practice, and the Danish format consists of two joint 90 min sessions (assessment and feedback; Trillingsgaard et al., 2017). The MC aims to foster intimacy in couples and builds on methods from integrative behavioral couple therapy (Christensen et al., 2020), motivational interviewing, and relationship health education (Cordova, 2014). Though manualized, the therapist flexibly tailors the MC to the couple's unique strengths and concerns and identifies the most adequate plan for subsequent caretaking of the specific relationship.
Randomized control trials in the United States and in Denmark have found small to medium effects on relationship functioning as well as on individual depression symptoms of two annual MCs across 1 or 2 years (e.g., Cordova et al., 2014;Gray et al., 2020;. The MC has been successfully adapted to different populations including military couples (Cordova et al., 2017), perinatal couples (Darling et al., 2021), low-income at-risk couples (Gordon et al., 2019), Korean couples (Lee & Kwon, 2018), lesbian couples (Minten & Dykeman, 2019), transgender couples (pilot study by Minten & Dykeman, 2021), and couples who disagreed on relationship concerns (Reyes et al., 2020). Taken together, the MC is classified as a well-established intervention meeting the criteria for the highest level of evidence outlined by Wittenborn and Holtrop (2021;adapted from Southam-Gerow & Prinstein, 2014), as also concluded by Doss et al. (2021).
In terms of improving relationship health, results of the previous RCTs  visualized a trajectory of change shaped as a climbing M; small positive increases after the first MC, then a small decrease followed by small to medium increases after the second MC. Most measures even indicated an anticipation effect (Cordova, 2014), with increases in relationship functioning prior to the second MC. Some decreases were found at follow-up, but most measures maintained significant improvements . These findings reflect the assumption that the MC (as any checkup model in its definition) promotes healthy relationship maintenance through brief but regular care (e.g., Cordova et al., 2014). Inherent in this assumption is that longitudinal effects of the MC depend on recurring booster sessions over time. To date, no study has followed couples through more than two annual MCs and beyond 2 years which leaves the longitudinal effects of the MC untested.
A checkup model for whom and when: Selecting a sample of responder couples A well-established finding from longitudinal studies of couples is that their average level of relationships satisfaction generally declines over time (see e.g., the systematic review and meta-analysis by Bühler et al., 2021). This consistent average decline could lead one to argue that all couples are relevant targets for preventive interventions. However, recent research using latent class approaches shows substantial heterogeneity in trajectories of relationship satisfaction so that a minority of couples experience steep decline over time, while the majority (50%−90%) of couples show insignificant or minimal decline (for a review of the past two decades of research in the field see e.g., Fentz et al., 2022;Karney & Bradbury, 2020;Proulx et al., 2017). Furthermore, couples' low initial satisfaction level is predictive of deterioration, while initially satisfied couples usually stay satisfied over time (Fentz et al., 2022;Karney & Bradbury, 2020). Previous RCTs of the MC recruited couples across the full spectrum of relationship satisfaction . Although these studies found small to medium average effects on relationship functioning from receiving two MCs, these effects cover a considerable heterogeneity with only 10.2%−33.3% of the couples reaching a reliable change (depending on measurement and timepoint, based on the Reliable Change Index; Jacobson & Truax, 1991). Taken together, findings imply that not all couples are equally relevant recipients of preventive interventions. Couples who did not meet the criteria for a reliable change may be couples who (a) received the MC at a happy time of their relationship with little room for improvement and a prognosis of stability, (b) were severely distressed and in need of a different type of help (e.g., couple therapy, individual therapy), or (c) received the MC too late in terms of relational damage and lack of commitment to invest in the relationship. Outside a research setting serving free recurring MCs for all couples, these types of nonresponder couples would probably be served a first checkup and then referred to either a more intensive service (e.g., couple therapy or divorce counseling) or a less intensive service (self-directed maintenance of the relationship). Each type of nonresponder couples would be less optimal candidates for recurrent checkups as the timing and dose of an additional MC might not be appropriate. From a cost-effective perspective, the most JOURNAL OF MARITAL AND FAMILY THERAPY | 51 relevant recipients of recurrent MCs are responder couples for whom the first MC was an adequate and beneficial dose of care.

Aim
This current RCT aimed to test if couples who have benefitted from two MCs can maintain and boost their relationship functioning with a third MC provided at Year 5. Using a sample of responder couples, we target this aim with three main analyses of (1) the long-term maintained effects of the two previous MCs prior to the current baseline, (2) the between-and within-group booster effects of the third MC using the current baseline, and, finally, (3) the between-and within-group booster effects of the third MC using the original baseline in the previous RCT in order to graph the outcome variables across the full study period of 5 years and 3 months.

Design
This RCT extends a previous RCT  in which 233 couples were randomized to an intervention group receiving two MCs scheduled 1 year apart or a control group receiving movie tickets and a feedback report. Details and results are described in .
In the current study, we randomized couples, who benefitted from the two previous MCs (referred to as responder couples), to either a new intervention condition receiving a third MC (n = 32) or a control condition receiving no further intervention (n = 31). This re-randomization took place 4 years and 9 months after the original baseline. The intervention was conducted by one of five trained psychologists at one of two sites, the private clinic of Center for Family Development in Copenhagen, or the university clinic at Aarhus University. Data were collected through online surveys at the 4-year-9-months baseline (Week 0), prior to the MC (Week 16, intervention group only), 2 weeks after (Week 18), and at follow-up (Week 28).

Recruitment and inclusion procedure
Couples were eligible for the current study if they were responders defined by the following criteria: Both partners had provided data at the original baseline (Week −244, see Figure 1) and at 2 weeks after the second MC , and at least one partner showed a positive reliable change (1.96 × 2 × StandardError 2 ; Jacobson & Truax, 1991) from the original baseline (Week −244) to after the second MC (Week −190) on at least one of the four outcome variables. A medium size of effect was reasonable to expect from a sample of previous responders. When accounting for an attrition rate of 20%, a power analysis indicated that a sample of 72 couples would be needed to detect a between-group difference with a medium effect size (Cohen's d of 0.65) with a power of 0.8. We classified all couples in which both partners had answered at both timepoints (n = 95 couples) into responders versus nonresponders, resulting in 65 eligible couples. To allow for attrition, we lowered the criterion for a reliable change by 20%, resulting in a total sample of 76 couples (80% of the 95 couples). Invitations were sent to each partner by e-mail and included a personal login to the online registration and consent form. We enrolled couples across two waves, May and August 2018.  Figure 1 illustrates the 5-year flow of the participants beginning with the original RCT and continuing through the current RCT. Prior to the current baseline survey (Week 0), 6 out of the 76 couples were excluded because they were no longer a couples or were currently receiving couple therapy, and 7 couples declined the invitation expressing that they were satisfied with their relationship and therefore in no need for intervention. When both partners had answered the current baseline survey (Week 0, N = 63 couples), they were immediately rerandomized using sequentially numbered, opaque, sealed envelopes (Doig & Simpson, 2005). No monetary incentives were given to participants but couples in the control group were compensated at Week 16 with a gift consisting of a card game for couples. All 63 couples were included and analyzed within an intention to treat (ITT) approach. All study procedures complied with standards from the regional ethical committee, and the study was approved by the Danish Data Protection Agency.

Relationship satisfaction
Global relationship satisfaction was measured using the marital satisfaction inventory-brief (MSI; 10 items; Whisman et al., 2009) and the couple satisfaction index (CSI; 16 items; Funk & Rogge, 2007). On CSI, respondents rate their level of satisfaction on 6-or 7-point Likert scales with a possible sum score ranging from 0 to 81 and a higher score indicating higher satisfaction. Item samples of the CSI include "My relationship with my partner makes me happy" (rated from 0 = not at all true, to 5 = completely true) and "Please indicate the degree of happiness, all things considered, of your relationship" (rated from 0 = Extremely unhappy, to 6 = Perfect). We defined a couple as experiencing relational distress if at least one partner scored below 51.5 on the CSI (Funk & Rogge, 2007). The CSI showed high internal reliability in the current study (Cronbach's α at Week −244 = 0.97; N = 462 participants). The MSI was originally developed to screen for relationship discord and is composed of 10 items deriving from five different scales of the MSI-Revised (Whisman et al., 2009). Items are rated binary (as Yes or No) with a possible sum score ranging from 0 to 10 and a higher score indicates less distress. Item samples of the MSI include "There are some serious difficulties in our relationship" and "Our sexual relationship is entirely satisfactory." The Cronbach's α of the MSI in the current study was 0.66 (Week −244) which is considered acceptable given that the MSI measures different relationship functioning domains. Though deriving from different scales, the 10 items of the MSI have been found to measure the same underlying latent construct of relationship distress (Balderrama-Durbin et al., 2015).

Intimacy
Intimacy was measured using the intimacy safety questionnaire (ISQ; 27 items; Cordova et al., 2005). Respondents rate their feeling of intimacy with their partner on 5-point Likert scales, and a mean score is calculated ranging from 0 to 4 with a higher score indicating more intimacy. Item samples include "I feel comfortable telling my partner when I'm feeling sad" (0 = Never, 4 = Always) and "I feel like I have to watch what I do or say around my partner" (0 = Always, 4 = Never). The ISQ showed high internal reliability in the current study (Cronbach's α = 0.90 at Week −244).

Responsive attention
The participant's perception of their partner's daily provision of responsive attention was measured with the responsive attention scale (RAS; 12 items; Trillingsgaard & Fentz, 2016 1 ). Items are rated on 5-point Likert scales with a possible sum score ranging from 12 to 60 and a higher score indicating higher perceived responsive attention. Item samples include "I receive a warm welcome when we meet at the end of the day" and "If I tell my partner about my day, he or she listens with interest" which are rated from 1 = Very rarely to 5 = Very often. The Cronbach's α of the RAS was acceptable (α = 0.81; Week −244).

Therapist manual adherence
All MC sessions were videotaped, and manual adherence was coded on a random sample of 20% of the tapes by two independent raters following the procedure of . Therapist behavior was rated from 0 (Did not adhere to manual) to 5 (Completely adhered to manual) on 14 elements during the assessment and feedback sessions of the MC.
The average adherence rating was 4.77 (range from 3.88 to 5.00), indicating that therapists overall adhered to the MC manual. The interrater reliability was good as coders agreed within one level of the scale in 89.3% of their ratings.

Data analyses
Data were analyzed in SPSS 27. Data set can be obtained from the last author upon request. We tested if the current intervention group (receiving a third MC) and control group differed on sociodemographic characteristics at the original baseline (Week −244) using Pearson's χ 2 tests for categorical variables and MANOVA for continuous variables. We also included two other groups in this comparison; couples classified as nonresponders (n = 19) and the original control group from the previous RCT  who had never received any MC (n = 117; see Table 1). All other analyses in this study exclusively included the current responder sample (63 couples). To investigate the long-term maintained effects from the original RCT baseline (Week −244) to the current baseline (Week 0) we conducted paired t-tests on the pooled responder sample.
To evaluate the treatment effects of the third MC, we used a dyadic score model following recommendations by Iida et al. (2018) using partner means ([female partner + male partner]/2) and partner differences ([female partnermale partner]/2) in relationship outcome variables. We used this dyadic score model to account for the considerable shared variance in the outcome variables, for example, the correlations between partners in baseline scores (at Week 0, rs = 0.33−0.49) and in change scores (from Week 0 to 28: rs = 0.22−0.33). We carefully tested for gender differences and effects in partner means and in partner differences. Although male partners were generally more satisfied than female JOURNAL OF MARITAL AND FAMILY THERAPY | 55 partners (see Figure A1 in the appendix), we found effects only in partner means and not in partner differences. Thus, we report the more parsimonious model focusing on partner means. While the trajectories of change in the control group fitted a linear model, the trajectories among couples in the intervention group changed direction. To allow for these breaks, we built a factorial multilevel model with a random intercept and couple as clustering variable where each timepoint was added as a dummy coded (0/1) factor. We tested the between-group difference in change and the within-group change in two models. First, we modeled change in the current RCT as change from the current baseline (Week 0) to subsequent timepoints at Week 16 (only for the intervention group), 18, and 28 respectively. Second, we modeled change across all 5 years using the original RCT baseline (Week −244) as the reference and estimating change to after each of the three MCs and 18), and to the 5-year-3-months follow-up (Week 28). Reporting of the within-group effects was included because the sample is relatively small and because both groups originate from the same condition (receiving two MCs) and constitute a relatively homogeneous sample of responder couples. We calculated Cohen's d effect sizes by dividing the estimated effect at each timepoint by the pooled standard deviation of the 63 couples at either the current baseline or the original baseline.
We were able to obtain data on 95.5% of the maximum possible number of observations across Week 0−28 (844 obtained observations out of 884 possible observations 2 ).
As the MC was developed for couples across the spectrum of relationship satisfaction, we conducted sensitivity analyses for the influence of severely distressed couples on treatment effects, where we identified couples as outliers if they showed one or more "probable outlying" low score (z > 2.58 equaling the 1% most extreme cases; Field, 2018) on at least one of the four outcome variables at Week 0, 16, 18, or 28.

RESULTS
The sociodemographic characteristics of the couples randomized to the intervention (third MC) or the control condition in the current study are described in Table 1. This table also outlines characteristics of the nonresponder couples and the original control group from the previous RCT for comparison. These four groups did not differ significantly (ps > 0.131) at the original baseline (Week −244) with the single exception of fewer dual employed couples among nonresponders compared to the current intervention group (p < 0.01, Cramer's V = 0.39).
Means and standard deviations of the four outcome variables at all 11 timepoints for the four groups are reported in Table A1 in the appendix.

Long-term maintenance of effects from original RCT to current baseline
The current responder couples (N = 63) had maintained small to medium long-term effects of the two previous MCs across the 4 years and 9 months from their original baseline level (Week −244) to the current baseline (Week 0) on MSI (d = 0.51, p < 0.001) and ISQ (d = 0.39, p = 0.003). The RAS trended toward a significant maintained effect (d = 0.24, p = 0.058), while no effect was maintained for the CSI (d = 0.00, p = 0.990). At the current baseline, 22 of the 63 couples (34.9%) had at least one partner reporting relational distress.

Booster effects of the third MC across 6 months
Results of the third MC (completed by 30 couples) are presented in Table 2. No significant baseline differences (at Week 0) were seen between the current intervention and control group in any of the outcome variables (ps > 0.367), as expected following random assignment. A possible source of bias was found in that three couples in the intervention group, but none in the control group, were identified with outlying low scores (indicating high level of distress). Sensitivity analyses revealed that our ITT results were not considerably changed by omitting the three outlying (highly distressed) couples. Still, to address these potential outliers, we visualized trajectories both with ( Figure

Maintained and booster effects of three MCs across 5 years
Effects in our responder couples of each of the three MCs (two received in the original RCT and one received in the current RCT) are presented in Table 3 and Figure

DISCUSSION
This RCT examined the longitudinal effects of the MC following responder couples through two or three MCs across more than 5 years. In brief, these couples who had benefitted from two previous MCs showed maintenance of small to medium effects on two out of four outcome measures (MSI and ISQ) at 4 years and 9 months. Hereafter, the changes in couples who received a third MC were not significantly different than the changes in couples who only received the first two MCs. Across the period of the third MC, within-group analyses revealed that while couples in the current control group showed flat trajectories, couples receiving a third MC experienced small to medium within-group booster effects on three out of four measures (MSI, CSI, and RAS) with one measure (MSI) maintaining a small to medium effect at the 5-years-3-months follow-up. Generally, these boosts around the third MC repeat the climbing M pattern of trajectory seen in couples receiving the two previous MCs .
Couples receiving a third MC showed small positive anticipation effects on two outcomes, similar to findings from the original RCT. Potential explanations for these anticipation effects could be out-of-session mechanisms such as relationship reappraisal, behavioral activation, and motivation from social control and social desirability (see also . Cordova (2014) describes how such out-of-session mechanisms may be activated already by the reminder of an upcoming MC when partners turn their attention to the relationship with increased care, comparable to the increased brushing and flossing up to a dental checkup. Together with the unexpected small negative anticipation effect in intimacy (though followed by pre-to-post improvement), our findings may suggest that couples, when reminded of an upcoming checkup, experience a blend of positive and negative reappraisals of their relationship. Couples may recommit to the relationship in global terms (the positive anticipation effects on MSI and CSI) while also acknowledging some decline in more specific terms (as intimacy). Generally, we found that the within-group effects of the third MC were not as large as those of the second MC. This contradicts our initial expectation of larger effects among responder couples. Four circumstances of the third MC may help to explain this unexpected finding. First, at the current 4-years-9-months baseline, our responder sample had maintained small to medium effects on half of the measures. This maintenance of effects is rather impressive given the long period of time (4 years) from the second MC to the third MC, and, at the same time, it may leave less room for further improvements. Also, the F I G U R E 3 Outcome trajectories across 5 years and 3 months in the current intervention and control group, the nonresponder group, and the original control group (who did not receive any MC). Three outlying couples omitted. Graphs are based on the raw partner means since not all groups and timepoints were included in the current models. Y-axes are sized to two standard deviations. The times of each MC are marked with thin vertical lines. MC, marriage checkup. absence of a boost in intimacy by the third MC may be explained by intimacy theory in that intimate events gradually built accept and intimate safety (Cordova, 2014). This nature of intimacy is reflected in its steadier linear growth with more stability across the long period after the second MC in contrast to the decreases found in the climbing M trajectories of the three other measures. This difference in trajectories between intimacy and some measures of relationship satisfaction was also found in previous studies (e.g., Cordova et al., 2014). Second, whereas the first two MCs were scheduled annually, the time space between the second and third MC was much longer (4 years). This time delay may cause couples to retrieve less of their previous learnings reflecting a delayed reinforcement of the booster effects. Third, after the second MC, couples did not expect any further checkups, thus the invitation to a third MC (and to participate in the current study) was unexpected. Effects may be larger when MCs are anticipated and part of a recurrent scheme. Fourth, most couples (n = 24; 77%) got a new therapist for the third MC which could have reduced the benefits of an established therapeutic alliance from the first two MC (Hughes et al., 2021). Future studies should test the accumulative effect of recurrent MCs by keeping the therapist and the spacing between MCs consistent.
One measure of relationship satisfaction (MSI) showed generally stronger maintenance of effects than the other measure (CSI), which could be explained by characteristics of the two measures. Item response theory-based examination of the MSI has found it to discriminate distressed couples particularly well (Balderrama-Durbin et al., 2015). Specifically, Balderrama-Durbin et al. (2015) found that respondents should be relatively distressed before they "tipped" from being more likely to report "concerned" than "not concerned" on most of the MSI items (binary scoring of Yes or No). A sum score of or below 6.00 on the MSI indicates relationship discordance (i.e., four or more areas of concern; Whisman et al., 2009). Couples in the current study exceeded this cut-point already after the first MC (Week −210 for the intervention group and Week −242 for the control group) and stayed above this cut-point at all subsequent timepoints, with only one exception for the intervention group (a small drop at Week 0). Thus, small changes in relationship distress among the nondiscording couples in the current study may be better captured by the 6-or 7-point Likert scales of the CSI.
It follows from our selection of a responder sample that our results cannot generalize to the full population. Though the recurring MCs seemed like a beneficial dose and type of help for the current responder couples, this may not generalize to the nonresponder couples. Based on panel plots of the 2-year trajectories in the 19 nonresponder couples excluded from this study, they were characterized by being either (a) constantly distressed throughout the period with no or close to no response to the two MCs, (b) responding positively to the MCs but quickly dropping down to their baseline, or (c) constantly satisfied. From the perspective of both costeffective and ethical delivery of help, the adequacy and necessity of the MC will depend on the individual couple's need and response. For distressed couples who do not respond to the first MC, this intervention is best conceptualized as an important steppingstone toward more intensive interventions tailored to their specific needs.
Finally, findings should be interpreted in light of some limitations. First, this study was short of power to detect statistical significances of small effects, thereby including a risk of a type II error. The current effect sizes (e.g., the between-group effects of 0.33 and 0.31 found in the MSI) call for future replications with larger samples to detect potential betweengroup effects with a power of at least 0.80. As the MC is a brief intervention offered to couples across the continuum of distress, small effects may represent meaningful benefits in a public health perspective. Second, our design did not allow us to attribute any of the effects to mechanisms of change specifically linked to the MC. Studies comparing the MC to another intervention or studies specifically designed to disentangle mechanisms of change are needed for that purpose.

CONCLUSION
In conclusion, this study indicates that couples who benefit from two annual MCs can maintain some of these improvements in relationship health across long periods of time, while the ability to boost such improvements with a third MC was not clearly supported. Thus, improvements for couples assigned to a third MC were not statistically greater than the longitudinal effects seen for control couples who similarly benefitted from two MCs (between-group effects). Our results could, however, point to the importance of further examination of the longitudinal booster effects of regular MCs in a larger sample, as the current study did not have power to detect small between group effect sizes. The average size of effects achieved from the MC model may not point to this intervention as a stand-alone treatment for chronic relationship distress, but the current results point to the MC as a relevant and tolerable intervention that support couples in sustaining relationship functioning across longer periods of time. As such, the MC model should be considered as an integrated part of a more comprehensive public health strategy for preventing couple distress in the future. APPENDIX F I G U R E A1 Gender differences in outcome trajectories across 5 years and 3 months in the current intervention and control group (ITT). Graphs are based on the raw partner means since not timepoints were included in the current models. Y-axes are sized to two standard deviations. The times of each MC are marked with thin vertical lines. ITT, intention to treat. Week −210 Week

−197
Week −190 Week −140 Week 0 Week 16 Week 18 Week 28 Marital satisfaction inventory-B Current intervent.  F I G U R E A3 Outcome trajectories across 5 years and 3 months in the current ITT intervention and control group, the nonresponder group, and the original control group (who did not receive any MC). Graphs are based on the raw partner means since not all groups and timepoints were included in the current models. Y-axes are sized to two standard deviations. The times each MC are marked with thin vertical lines. ITT, intention to treat; MC, marriage checkup.