Introduction

Approximately 1 % of adults suffer from uncontrolled loss of stool [1, 2]. Very few guidelines for managing fecal incontinence in adults are available, and one of these was issued by the National Institute for Health and Clinical Excellence for UK in 2007 [3]. In general, the first-line therapy may try to solidify liquid stools with fiber elements or medications [4]. The specialized management usually applies nonsurgical methods, such as pelvic floor exercises (PFE), biofeedback (BF), and electrical stimulation (ES) [3], to improve the strength and coordination of the sphincter muscles. However, the internal anal sphincter consists of smooth muscles and is not amenable to voluntary exercises. Furthermore, it is difficult to actively reach the slow-twitch type I fibers [5, 6]. It thus makes sense to use ES on the smooth and slow muscular components.

When judging the efficacy of the ES treatment or the combination of BF and ES (BF + ES), current type, current strength, and application mode are essential [610], as recently discussed by Schwandner et al. [11].

For example, Telford and colleagues [5] demonstrated that the motorical threshold for sphincter muscles with low-frequency electrical stimulation (LFS) is around 20 mA. Specifically, they used the strength-duration test to measure the current strength required for visible muscle contraction at different pulse durations. They first found that the current intensity at 1 ms pulse duration was the best predictor of incontinence [5]. Furthermore, in their control group [12], they observed that a currency of 18.2 mA (90 % reference interval ≥19 mA) was required at 1 ms pulse duration for muscle contraction.

LFS, as used in most studies of fecal incontinence, can be very painful when applied to the pelvic floor exercises [1315], and it can cause adverse device effects (ADE) [13]. Surprisingly, this relationship between pulse configuration, voltage amplitudes, and physiological outcome has not been addressed in any of the systematic reviews on ES for fecal incontinence, although it is well known in biomedical engineering [610]. It was also neglected in some two-group randomized trials where two ES stimulations were compared and both ES were below the threshold for effectiveness [13]. Stimulation with alternating current at medium frequency (MF > 1,000 Hz; amplitude-modulated medium frequency: AM-MF), also termed pre-modulated interferential ES in the literature, does not have the disadvantage of LFS because its biological effect is based on a different principle than the all-or-nothing effect of LFS [16]. Previous systematic reviews [Supplementary Table S1; 1719] did not distinguish between LFS and MF therapies.

The systematic reviews focused either on ES [17] or on BF [18, 19], but they did not consider ES + BF to be a combination therapy which would be important for treatment guidelines. Here, we aim at identifying the best second-line conservative treatment, consisting in BF, ES, or BF + ES by taking into account the type, strength, and application mode of current. We specifically address duration–response relationships which were not correctly reported in previous reviews [19] and safety issues. By finally grading the evidence [20], we provide valuable aid to decision making in a guideline for fecal incontinence.

Methods

The protocol to this systematic review was published in PROSPERO (CRD42011001334) on June 1, 2011. A detailed description of the methods can be found in the Electronic supplementary material. In brief, we included randomized controlled parallel-group trials of BF or ES or BF + ES in adults in need of a second-line conservative treatment and no obvious need for surgery for fecal incontinence reporting patient-related outcome, i.e., remission, response, or disease-related quality of life (QoL) on validated scales as assessed by two reviewers in consensus. Data extraction followed recommendations of the Cochrane Handbook for Systematic Reviews of Intervention [21]; for details, see Electronic supplementary material.

Excluded studies and reasons for exclusion are summarized in Supplementary Table S2. Supplementary Tables S3 and S4 provide details on populations, interventions, comparator, outcome, and design of included studies to find matching entries for meta-analysis. Safety issues were coded as serious adverse events (SAE) and adverse device effects (ADE), both according to ISO 14155:2011.

The 2 × 2 treatment scheme has been analyzed using an analysis of variance meta-analysis (meta-ANOVA) [22] for the endpoint remission using the relative risk (RR) as effect measure. Results were graded high, moderate, or low quality using the Grades of Recommendation, Assessment, Development, and Evaluation (GRADE) [20].

Results

Study selection

We identified 128 potentially relevant studies from titles and abstracts (Supplementary Fig. S1), including several systematic reviews (Supplementary Table S1) [4, 17, 19, 2336]. Twenty-three studies were retrospective chart reviews or prospective case series [3759], and four current trials [6063] were found in trial registries, but not in the literature. Seven RCTs [15, 6469] met at least one exclusion criterion (Supplementary Table S2), and 13 RCTs remained [11, 14, 16, 18, 7078] (Supplementary Tables S3 to S5).

Qualitative summary

We identified two different groups of patients in the RCTs. The first group of studies predominantly included younger mothers [70, 73, 74], but females only [16, 75], and the second group included patients of all ages (predominantly elderly) and both sexes (predominantly women) [11, 14, 16, 71, 72, 7678]; Supplementary Text 2 of the Electronic supplementary material. In one of the 13 RCTs [78], the active ES treatment and the active ES placebo had 2.3 and 0.13 mA current and were substantially below the therapeutic window of about 20 mA [5].

The quality of the RCTs varied substantially (Supplementary Table S3), and Fig. 1 shows that there was a strong correlation between trial quality and group size (r = 0.81; p = 0.0007). Furthermore, significant trials tended to be of better quality (one-sided, p = 0.0299) and to have a larger group size (one-sided, p = 0.0450).

Fig. 1
figure 1

Quality of trials by sample size per group. Full and open circles display randomized controlled trials with and without significant differences between treatment groups, respectively. There is a clear correlation between group size and quality of trials. Significant trials tend to be of better quality (one-sided, p = 0.0299) and tend to have a larger group size (one-sided, p = 0.0450)

Only three of the 13 trials [11, 72, 77] fulfilled all quality criteria; the other trials may have incurred biases. For example, intention-to-treat (ITT) analyses were performed in the absence of a validated outcome measure or a masked assessor in some trials [14, 18, 73, 78], and these studies were considered as moderate quality evidence. Good and moderate quality studies reported adequate randomization procedures and case number determinations. Other trials [16, 70, 71, 7476] were considered as low quality. Several reasons lead to a downgrading of these trials. For example, the RCT by Heymen [76] was lacking of a validated score, it had a large variability due to small sample size, and 20 % of the patients were missing in three of four groups. Figure 2 summarizes treatments and effects after exclusion of low-quality trials and trials with too low current; see Supplementary Fig. S2 for all trials. Figure S3 displays effect sizes and 95 % confidence intervals (CI) for all trials. First, no trial showed superiority of control. Second, no trial showed superiority of any monotherapy, when compared with BF + ES. The longer and more intensive the treatment the stronger were the effects (Supplementary Tables S4 and S5). This finding holds both within and across trials.

Fig. 2
figure 2

Results and quality of clinical trials, with at least moderate quality of biofeedback (BF), sufficient electrical stimulation (ES), or biofeedback plus electrical stimulation (BF + ES). A triangle denotes significant difference between two stimulation modes; a bar represents not statistically significant difference. Trial quality is color coded: moderate, high. Size of bars or triangles is proportional to case numbers ranging from 40 [73] to 171 [18]. For example, the trial by Heymen and colleagues [14] is of moderate quality, demonstrated superiority of BF over PFE alone, and a total of 108 patients were randomized in the trial. The off-diagonals represent the two monotherapies, the top left represents control and includes pelvic floor exercises (PFE) or some other standard therapy which is neither BF nor ES. The bottom right represents the combination therapy BF + ES

Quantitative data summary

For meta-analysis, we chose the remission rate because it was the endpoint with the greatest patient relevance, and it was the most completely reported endpoint. Results were similar for other endpoints (Cleveland Clinic score (CCS): Table 1; z scores for any validated outcome: Supplementary Table S5).

Table 1 Remission rate (in %) and duration and response (change in Cleveland Clinic score) by treatment group and trial quality for control, biofeedback, electrical stimulation, and combination of BF and ES

The meta-ANOVA of at least medium-quality trials showed the tendency for inferiority for ES when compared with (RR = 0.47; 95% CI, 0.13–1.72; p = 0.25) and superiority of BF over control (RR = 2.35; 95% CI, 1.33–4.16; p = 0.0033) (Table 2). The combination therapy BF + ES was superior to both BF (RR = 2.12; 95% CI, 1.42–3.16; p = 0.00022) and ES (RR = 22.97; 95% CI, 1.81–291.69; p = 0.016) (Table 2). No positive results on ES versus BF or control emerged. RRs for BF over control and for ES on top of BF over BF were close to 2 when all trials were included, when adjustments were made for treatment duration or duration of continence.

Table 2 Meta-analysis results from two-way analysis of variance for remission for all trials of at least moderate quality. Electrical stimulation tended to be inferior to control, while biofeedback was superior to control. The combination therapy BF + ES was superior to both monotherapies

When AM-MF ES was considered to be a different therapy concept from LFS, similar results were obtained (Supplementary Table S6). Specifically, both LFS and the combination of LFS and BF were not superior to not doing these therapies (p = 0.96 and p = 0.49), while BF alone and the combination of AM-MF + ES were superior to not doing these therapies (p = 0.005 and p = 0.002; Supplementary Table S6).

EMG-BF plus AM-MF ES sent 42 and 54 % of patients into remission [11, 72], 75 % of patients were asymptomatic with the combination of BF and LFS in [73], 27 % completely asymptomatic in [74], and no patient cured in [16]. Different modes of BF brought between 20 and 54 % remissions [72, 73, 75, 76]. LFS yielded the worst remission rates which varied between 0 and 4 % [11, 16, 70, 78].

Safety

Reporting of adverse events was patchy, not in accordance with definitions [71], and SAEs were reported in the text or in flow charts of only three trials: 2 in 15 patient years of insufficient or sham ES [78], 5 in 64 patient years of BF or control [18], 8 in 59 patient years of EMG-BF plus AM-MF ES, and 5 in 59 patient years of EMG-triggered stimulation [72]. This indicates homogeneous populations with 0.15 disruptive SAE per patient year. ADEs that led to treatment discontinuation were reported for a total of 31 cases (5.6 %) in seven trials with 728 participants [11, 14, 18, 71, 72, 78, 79] and in one description of low-frequency ES [13]. Nondisruptive ADEs were reported in one trial [11], the persisting pain is close to half of patients using LFS and none using AM-MF stimulation.

Grading the quality of evidence

The number of patients in remission is considered to be a valid and patient-relevant endpoint as are validated incontinence scores when masked observers are used and when data are analyzed on an ITT basis. However, the definition of remission differed across trials. It was described as asymptomatic [73], completely asymptomatic [74], best possible effect on a 10-point scale [70], no accident in 1 week [14, 18, 75, 76], a score of 0 in an incontinence scale when asked about the last 30 days [11, 71, 72], and cured [16, 18, 78]. The different definitions had an effect on the proportion of missing data which was as large as 39 % because of the incompleteness of patient diaries [18].

One aspect for grading quality of trials is duration-response gradients. Two trials reported assessments in 3-month intervals, and a duration-response gradient could be observed [80]. Specifically, the longer the treatment the more patients were in remission and scored better treatment response.

Risk of bias across studies

There are two obvious sources of heterogeneity between studies. First, the spectrum of patients included in the trials was diverse. Trials in predominantly younger mothers cannot be compared with trials in elderly patients of both sexes despite of the different etiology in these patients. And there are other inclusion/exclusion factors which might be prognostic for treatment effect. Anatomic defect or inflammatory chronic bowel disease was an explicit exclusion criterion in some studies [77], while these patients were included in other trials [14]. Previous anorectal surgery and anal sphincter damage were classified differently across studies, and the extent of heterogeneity remained unclear after data extraction.

Second, the treatment forms varied substantially (Supplementary Table S4), and the most important differences were type of treatment, patient position, and intensity of first- and second-line treatment. ES studies differed in treatment duration and average current which sometimes was even below the treatment window.

Six included trials were found in public registries [8186]. Selection or publication biases were not apparent. Registration information and consistency between information in the registry and the published report needs improvement. Specifically, in four trials, there was no information or inconsistent information on primary and secondary outcomes. The primary endpoint was altered in two trials [14, 72], in both trials for very good reasons.

Discussion

Following up on promising results of the combination of ES with BF, this systematic review revealed more, high-quality evidence than previous reviews because of the publication of several new trials [11, 1416, 7072, 87]. The clinical development of BF and ES can be grouped into three general phases. In the first phase, diverse case series were reported, and those had to be excluded from this systematic review.

In the second phase, narrow patient populations were studied. Specifically, fecal incontinence in younger females after delivery was considered in several trials [70, 7375]. Outcome measures included patient diary and manometry. While manometry does not adequately reflect treatment response [16], diaries were incomplete with missing data as high as 39 % [18]. To avoid missing data in questionnaires, patients are nowadays asked about continence/incontinence in the recent past, such as the preceding week or month.

In the third phase, several RCTs studied the intensity, duration, and the mode of BF and ES treatments. BF was given optically or acoustically, signaling change in pressure, in electromyogram (EMG) or ultrasound. EMG devices for home use were developed so that daily sessions at home were added to sessions with a therapist. ES used different frequencies and current strengths. Such “dose finding” sometimes required more than two treatment groups. The studies tended to be small so that not all comparisons were significant. The combination BF + ES was studied in recent RCTs of higher quality, which included sufficiently large numbers of patients, investigated several patient relevant endpoints at the final follow-up and also reported changes over time.

Two high-quality RCTs showed a consistently efficacious combination of AM-MF ES plus EMG-BF with 50 % patients being continent after 6 or 9 months of treatment. This treatment was consistently superior to low-frequency ES and to BF alone.

Two important aspects need to be addressed in future studies on fecal incontinence. First, an in-depth discussion on primary patient-relevant endpoint to be used is required. Specifically, Norton [88] argued that QoL should be the primary outcome. We do not agree with her conclusion. If the aim is to treat patients suffering from fecal incontinence, a successful treatment needs to make a patient continent. The degree of incontinence measured using a validated score, such as the CCS is semiquantitative and requires fewer study subjects than the dichotomous continent/incontinent endpoint. However, it is difficult to measure continence reliably because of different patient perceptions.

QoL is an important patient-related endpoint. If QoL is assessed in an elderly population suffering from fecal incontinence, the constructs determining their QoL need to be assessed reliably. For example, the fecal incontinence QoL questionnaire enquires sexual activities, traveling by plane or train, or eating out [89]. Patients seen in our own studies stated that these activities were not determining their QoL. The most important restrictions for their life was the personal stress caused by fecal incontinence, an important aspect was that whatever they did, the toilet had to be within reach [90]. There is a clear need for developing a reliable and valid QoL instrument for an elderly population suffering from fecal incontinence.

Remissions and responses were more frequent the longer the therapy lasted. This was demonstrated within two trials of highest quality and across trials. In consequence, both individual results and the synthesis could be graded higher.

In their recent systematic review, Norton and Cody [19] stated “No study reported any adverse events or deterioration in symptoms, …”. According to this systematic review, this statement needs to be corrected. Although reporting of safety was scarce, there were trials with SAEs, such as mortality, and even ADEs were reported in seven of 13 trials. Deterioration of symptoms was commonly reported as mean changes with standard deviations indicating a considerable proportion of changes for the worse. Specifically, [11] explicitly reported deterioration in 10.3 % under AM-MF ES plus EMG-BF and 82.9 % for LFS.

In their systematic review, Hosker and colleagues [17] stated that “electrical stimulation can cause a tissue reaction at the site of the electrodes. This usually resolves speedily when stimulation is stopped.” This statement is in line with the findings of [11] who reported that approximately half of the patients receiving LFS treatment complained of pain during stimulation, and a quarter could not tolerate high currents required to reach the motor threshold. Some patients reported a feeling of pressure that persisted for hours after training. But these side effects in the trial were only minor ones due to thorough precautions which are not taken for granted in daily practice.

In separate work, we showed [13] that ADEs can occur with some devices. A major concern is the danger of tissue damage at the electrode/tissue interface. This is described in detail in the Supplementary Text 3 of the Electronic supplementary material.

Using the GRADE approach, we graded the quality of the trials and summarized the evidence (Table 3). In their recent review, Norton and Cody [19] stated that “treatment options for fecal incontinence have not yet been investigated by means of well-designed trials.” The result of our systematic review is different. We have identified three well-designed randomized controlled trials of high quality [11, 72, 77] and another three trials of medium quality and sufficient ES [14, 18, 73].

Table 3 GRADE evidence profile of 5 trials investigating electrical stimulation plus biofeedback (ES + BF) combinations

One limitation of this review is that observational studies were excluded. Such quasi-randomized studies were considered as evidence of safe treatments involving properly working devices rather than evidence of efficacy. Furthermore, trial registries did not cover all trials before the 2008 version of the Declaration of Helsinki.

Despite these limitations, this systematic review is the first considering BF and ES to be a combination therapy, and in a supplementary analysis LFS was distinguished from AM-MF ES. An overview of the component treatments, their respective duration and intensity gradients is provided. Endpoints were weighted by grade of evidence to embed and support the results of pivotal trials. A first survey of safety is provided and discussed.

Conclusions

This systematic review demonstrated a superiority of BF over control and of the combination therapy BF + ES over BF and ES monotherapies. These findings were stable in sensitivity analyses. It was additionally found that LFS did not have a positive effect, either alone or in combination with BF. Both BF alone and AM-MF plus BF were superior to not doing these therapies. LFS has a risk for pain and a device-specific risk for tissue damage. There is sufficient evidence of high quality for a stronger recommendation for a certain treatment regimen of BF, ES, or BF + ES to patients with fecal incontinence. Specifically, we conclude that the combination of AM-MF ES in an EMG-triggered application mode plus EMG-BF should gain a stronger recommendation as specialized management. BF deserves a weaker recommendation.