Introduction

Overactive bladder (OAB) is a chronic debilitating syndrome of the lower urinary tract [1]. The reported prevalence in adults 40 years of age or older is as high as 33% among women and 16% among men and increases with age among both sexes [2]. Similar prevalence estimates have been observed across the USA and Europe, with somewhat lower rates reported in Asian countries [3]. Over half of all adults living with OAB experience bothersome symptoms, which include urinary urgency with or without frequency and incontinence, as well as nocturia, intermittency, slow urine stream, urine strain, incomplete emptying of urine and post-micturition dribble [4]. Urinary frequency is the most commonly reported symptom [5, 6] with urge incontinence increasing significantly for women aged over 44 and for men aged over 64 [7]. First-line treatment for OAB typically includes lifestyle and behaviour modifications. Following this, second-line treatments generally involve pharmacotherapy with antimuscarinic or beta-3 adrenergic agonist therapies, such as mirabegron. Subsequent treatment options for patients who fail these interventions include onabotulinumtoxinA, a neurotoxin that is injected into the bladder.

Urinary incontinence, including that due to OAB, has a substantial negative impact on health-related quality of life (HRQoL) and mental health [4, 5, 7], and interferes with daily activities [4, 8]. As the impact of OAB symptoms may be difficult to measure directly, efforts to quantify the impact of OAB symptoms from the patients’ perspective have led to the development of a variety of OAB and urology-specific patient-reported outcome (PRO) instruments. These include the OAB Questionnaire (OAB-q), the King’s Health Questionnaire (KHQ), the Patient Perception of Bladder Condition (PPBC) instrument, the Incontinence Impact Questionnaire (IIQ) and the Urinary Distress Inventory (UDI-6). The instruments enable characterization of OAB symptom control, quantify HRQoL impact and some (e.g. OAB-q, KHQ) permit conversion of scores to a 0–1 utility scale [9, 10] and are commonly used as outcome measures in trials of OAB treatments. Importantly, validation of a minimal important difference (MID) has made PRO instruments useful tools for identifying clinically significant effects in response to OAB treatment [11, 12].

In terms of measure properties, the OAB-q consists of a validated eight-item symptom bother scale used to determine urinary frequency, nocturia, urgency and continence [13, 14]. It is the first instrument to include an evaluation of both incontinence and continence in OAB [13], and is therefore widely used because of the ability to generally assess patients with OAB. The OAB-q also includes a 25-item HRQoL scale rating performance in each of four subscales (coping, sleep, concern and social interaction). Each item (including symptom bother items) is scored on a 6-point Likert scale ranging from 1 point (not at all) to 6 points (a very great deal) [13]. A symptom bother total score is derived from the eight-item symptom bother scores, then converted into a 0–100 scale on which higher scores signify greater symptom impact and lower scores signify improved symptom bother [15].

Limited information is available about the frequency of use and sensitivity of PRO instruments, or the consistency of PRO data across clinical trials of OAB interventions. Given how many OAB-specific PROs are available, some criteria to guide measure selection to ensure that results can be compared would be helpful but are presently lacking. To address these knowledge gaps, the aim of this study was to identify the most commonly used instruments in studies of patients with OAB, determine which instruments are the most useful for measuring burden in OAB and assess the HRQoL implications of OAB symptoms and the effects of treatment.

Methods

Literature Search

A systematic review of the literature was conducted in Medline/PubMed and EMBASE databases using a customized search strategy (supplementary material Table S1) for literature published between January 2006 and November 2017. The search was not limited by country or geographic region; however, it was limited to articles published in English. The design and implementation of the systematic review were guided by the PICOS (Population, Interventions/comparators, Outcomes, Study design) criteria. The population of interest was adults in the USA with OAB, including idiopathic OAB, idiopathic urge urinary incontinence, non-neurogenic urge urinary incontinence or refractory detrusor overactivity, with/without urinary incontinence. Studies were excluded if the population had neurogenic OAB, primarily stress incontinence or a known cause of OAB such as pregnancy, neoplasm, spinal cord injury or surgery to ensure a more homogenous population was included.

To be included, studies had to present sufficient longitudinal data (either baseline and end-of-treatment values, or change scores) to derive change over time using a disease-specific PRO instrument, including the OAB-q (as well as the abbreviated OAB-q short form [SF]); KHQ; PPBC; IIQ-7; UDI-6; Incontinence Specific Quality of Life (i-QOL); Hospital Anxiety Depression Scale (HADS); Overactive Bladder Satisfaction Questionnaire (OAB-S); Patient Perception of Intensity of Urgency Scale (PPIUS); Work Productivity and Activity Impairment (WPAI); and/or the SF; all versions. Observational studies, single-arm trials and randomized controlled trials (RCTs) were all eligible. Studies with a cross-sectional study design, fewer than 100 patients or insufficient information to determine a change in PROs were excluded. Two researchers independently screened abstracts and full-text articles for inclusion or exclusion. Any discrepancies were resolved through a third researcher arbitration.

Assessment of Included Studies

The reporting standards of the included studies were assessed using the Consolidated Standards of Reporting Trials (CONSORT) criteria for clinical trials [16] and STrengthening the Reporting of OBservational studies in Epidemiology (STROBE) criteria for observational studies [17].

Data Extraction

Two researchers extracted data from eligible articles into a customized Microsoft Excel workbook. Study characteristics extracted included authors, country, year, study objective(s), study design, sample size and OAB treatments. Patient-related data extracted included study level inclusion and exclusion criteria, International Statistical Classification of Diseases, Ninth Revision (ICD-9) codes used to identify OAB cohorts when relevant and demographic characteristics such as age, sex, comorbidity burden and previous OAB treatment. Outcomes data extracted included mean and standard deviation of all PROs measures of interest at baseline and study end and/or reported change scores over time. Where reported, values of all subdomains were included in addition to overall summary scores. Utility values were extracted where available.

Evidence Synthesis

The form of data synthesis was driven by the type of data available. Summary tables were generated describing study design, sample size, treatments included, PROs measured and a narrative overview of patient population and results. Results were organized by instrument and then by study type (non-pharmacologic clinical trial, antimuscarinic clinical trial, mirabegron clinical trial, onabotulinumtoxinA clinical trial and observational study).

Criteria to assess the usefulness of available instruments were specified a priori, and developed in line with previously identified recommendations [18]. The following criteria were considered: that the instrument

  • Is commonly used in trials of OAB treatments and observational studies

  • Comprehensively considers a wide variety of OAB symptoms and bother

  • Is specific to OAB, rather than urological conditions more generally

  • Has MID data available

  • Can be used to generate a utility value

The properties of the three most frequently used OAB-specific PROs were tabulated and compared. Then, additional assessment and synthesis were performed for the most frequently used instrument, including graphical summary of change scores and comparison of change scores to the established MID.

This article is based on previously conducted studies and does not contain any studies with human participants or animals performed by any of the authors.

Results

Implementing the search strategy yielded 3425 abstracts, 500 of which underwent full-text review. Fifty-eight studies met the final inclusion criteria, seven of which were identified by hand-searching reference lists (Fig. 1). In general, the included studies were in line with the CONSORT and STROBE criteria reporting recommendations (supplementary material Tables S2–S9) [19].

Fig. 1
figure 1

PRISMA flow diagram of records identified, included and excluded

Study and Patient Characteristics

All 58 studies reported results for multi-domain, OAB-specific measures with a MID available for comparison. Among these, 37 studies reported OAB-q scores (27 RCTs; three observational studies; one open-label extension; and six single-arm trials). KHQ scores were reported by 18 studies and i-QOL scores were reported by four studies (Table 1). Although the OAB-V8 (an abbreviated versions of the OAB-q) was identified in the review, it was not eligible for inclusion in the present study as it (along with the OAB-V3) is more typically used to help improve patient and physician communication regarding their symptoms, and therefore did not provide HRQoL burden over time. Similarly, the Bladder Self Assessment Questionnaire was also identified in three studies; however, as it is more often used in a lower urinary tract symptoms (LUTS) population, it did not meet the PICOS criteria [20, 21].

Table 1 Summary characteristics of included studies

The 58 included studies consisted of 53 clinical trials and five observational [22,23,24,25,26] studies (Table 1). Of the clinical trials, 37 were RCTs [15, 27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62], 10 were single-arm trials [63,64,65,66,67,68,69,70,71,72] and six were open-label extension studies [73,74,75,76,77,78]. The clinical trials included eight mirabegron studies [38, 43,44,45, 48, 58, 59, 61], six onabotulinumtoxinA studies [30, 35, 49, 53, 54, 74], 35 antimuscarinic studies [15, 27,28,29, 31,32,33,34, 36, 37, 39,40,41,42, 46, 50,51,52, 55, 56, 60, 62,63,64,65,66,67,68,69,70,71,72,73, 75, 76], two sacral nerve modulation studies [77, 78], one staged InterStim procedure study [57] and one study of foot reflexology [47].

Among the 53 clinical trials, the mean number of participants per trial was 929.0 (range 106–3185), with a median length of follow-up of 12 weeks (range 3–156 weeks). Most clinical trials evaluated the impact of antimuscarinic therapies, most commonly fesoterodine (26.4%), tolterodine (18.9%) and solifenacin (18.9%). In the clinical trials six different OAB-specific measures were identified. The most frequently used PRO instruments were the OAB-q (34 studies; 27 RCTs, six single-arm trials and one open-label extension study), followed by the KHQ (18 studies; 14 RCTs, two single-arm trials and two open-label extension studies) and the PPBC (12 studies; nine RCTs and three open-label extension studies).

Among the five observational studies, the mean number of participants per study was 292.6 (range 100–632). These studies tended to be longer, with a median length of follow-up of 104 weeks (range 24–312 weeks). A variety of OAB treatments were considered, and three OAB-specific measures were used: the OAB-q (three studies), the IIQ-7 (two studies) and the UDI-6 (two studies) (Table 1).

Assessment of Measures

The most commonly administered PRO instrument was the OAB-q (64%), followed by the KHQ (31%) and the PPBC (21%) (Table 1). Other instruments were included in five or fewer studies, with the i-QOL (n = 4 studies) only used in onabotulinumtoxinA trials. How those measures performed against the assessment criteria is presented in Table 2. The OAB-q was the only instrument to meet all outlined criteria and was thus selected for more detailed synthesis. A summary of additional properties of the most commonly used PRO instruments in OAB studies can be found in the supplementary material Table S10.

Table 2 Comparison of summary criteria for the most commonly reported overactive bladder health-related quality of life instruments

OAB-q Studies and Outcomes

Data from 37 studies were included in the OAB-q data synthesis. Overall, assessment of changes to OAB-q scores was reported by 22 antimuscarinic trials [15, 28, 29, 32,33,34, 36, 39, 41, 46, 50, 52, 55, 56, 60, 63, 65, 66, 68, 70, 72, 76], seven mirabegron trials [38, 43, 44, 48, 58, 59, 61], two onabotulinumtoxinA trials [53, 54], two sacral nerve modulation trials [77, 78], one staged InterStim procedure study [57] and three observational studies [24,25,26].

Baseline OAB-q values, where reported, and change scores (final minus baseline) for symptom bother, overall HRQoL and each subscale (coping, concern, sleep, social) for individual studies are shown in Figs. 2 and 3, respectively. Improved HRQoL is indicated by positive change scores on individual HRQoL subscales, while for symptom bother, improvement is indicated by a negative change score from baseline.

Fig. 2
figure 2

Baseline OAB-Q scores in included studies. All individual treatment arms within studies are included as a separate row. The overall HRQOL and symptom bother scale are included as separate columns. The four subscales (coping, concern, sleep, social) of the overall HRQOL are also included as separate columns. The Range of OAB-Q score is 0-100 (where a higher symptom bother score indicates greater symptom bother, while lower HRQOL scores indicate greater impact on QOL)

Fig. 3
figure 3

OAB-Q change scores in included studies. All individual treatment arms within studies are included as a separate row. The overall HRQoL and symptom bother scale are included as separate columns. The four subscales (coping, concern, sleep, social) of the overall HRQoL are also included as separate columns. Range of minimum important difference (MID) = 10.0 for all scales

Baseline and change score data were reported for a total of 46 and 61 study arms, respectively. Across studies, baseline HRQoL scores ranged from 35 to 68 and symptom bother scores ranged from 43 to 74, with a majority of these (59%) falling within a relatively small range (52 to 60 for symptom bother scores and 52 to 64 for corresponding HRQoL scores) (Fig. 2). HRQoL and symptom bother change scores (including placebo arms) ranged from 3 to 37 and from − 2 to − 45, respectively. Of the 15 placebo arms reported on, all but one [55] showed a benefit relative to the MID in overall HRQoL, coping subscale, concern subscale and symptom bother scale; and 14 of the 15 placebo arms also showed a benefit for the sleep domain (Fig. 3).

Among the 22 antimuscarinic trials utilizing OAB-q, all but one reported improvement in all domains relative to the MID. The exception was in the open-label extension to a 12-week trial, for which results were measured relative to the end of the original trial [55, 76]. The smallest numeric improvements were observed consistently in the social domain across studies. Placebo-controlled trials reported greater improvements in antimuscarinic arms relative to placebo arms. Among studies including multiple antimuscarinic doses, the study conducted by Chapple et al. reported a dose–response effect for fesoterodine [29], while the study conducted by Yamaguchi did not [56]. Notably, two other studies that included both fesoterodine and tolterodine arms did not report a notable difference in results across the two antimuscarinics [39, 60].

Among mirabegron clinical trials, all studies reported 12-week change scores and found improvements in all OAB-q domains relative to baseline based on MID. Improvements were observed across doses of mirabegron (25 mg, 50 mg and 100 mg). Of studies with multiple doses of mirabegron, one found a dose–response relationship [48], while the remaining two reported similar results across mirabegron doses [38, 44]. The change scores of greatest magnitude were associated with mirabegron given in combination with solifenacin [61]. When mirabegron was given as a monotherapy in RCTs, OAB-q improvements were consistently greater for mirabegron than for placebo or antimuscarinics.

Two published studies reported OAB-q results at 26 weeks for onabotulinumtoxinA trials. Both studies were based on the same underlying trial data; one publication reported OAB-q results directly [53] while the other reported converted utility values for cost-effectiveness analysis [54]. The OAB-q results found clinically significant improvement, numerically greater than the MID, corresponding to a calculated utility improvement of approximately 0.05.

Three observational studies reported OAB-q results. Patients received antimuscarinics in two of the studies [24, 25], and a staged InterStim procedure in the third [26]. For the most part, benefits were observed beyond the MID in at least one OAB-q domain. The exception to this was a solifenacin study, which stratified results by response status, and MID results were not achieved in the non-responder group [25].

Discussion

PROs that determine the impact of interventions from the patients’ perspective and quantify changes in HRQoL have become standard measures for evaluating OAB treatments. Several different instruments have been developed and validated, with optimal instrument selection guided by the study design and target patient population.

To better understand the current OAB research arena, a systematic literature review was undertaken to characterize the published literature reporting PROs associated with OAB treatment over time, identify the measures most commonly used and their alignment with specified criteria of interest. As previously reported [79], among bladder-specific instruments, the OAB-q remains the most frequently implemented PRO measure in clinical trials of pharmacologic OAB therapies. OAB-q was also the only commonly reported urology instrument to comprehensively assess OAB symptoms, be specific to an OAB population, have a published MID and be convertible to a utility score via a published algorithm. Together, the OAB-q, KHQ and PPBC were used to measure the patients’ perspectives 85% of the time, with some studies including multiple instruments. In contrast, there is no identifiable pattern of PRO instrument use among the small number of observational studies conducted to date. While measure selection in observational studies may be driven by characteristics of a population or study objectives, for ease of comparability, future randomized trials may benefit from using the OAB-q, KHQ and PPBC.

Extracted data from clinical trials using the OAB-q revealed a wide range of baseline HRQoL and symptom scores, with most participants with symptom bother scores of around 55 points, and HRQoL of around 58 points. Scores ranging from 46 to 62 on the symptom bother scale and inversely from 50 to 65 on the HRQoL subscale have previously been correlated with patients’ perceiving the impact of OAB as moderate to severe [80]. Synthesis of data from studies using the OAB-q to monitor treatment effects revealed improvement relative to MID with placebo, in most studies, and in most domains; however, in general, where comparisons had been made, treatment-related benefits were consistently greater than placebo-related effects. The smallest change scores (with or without treatment) tended to be observed in the social domain, although all domains tended to be associated with change scores of at least the MID. Data from some of the studies including arms with interventions administered at different dosages demonstrated the OAB-q to be sufficiently sensitive to identify dose–response relationships, although this was not observed consistently across studies. The results of this synthesis could be used to characterize baseline estimates of symptom bother and HRQoL impact among patients with OAB and the impact of treatments on these outcomes, both of which can serve as benchmarks for future comparisons. Strengths of this study include the comprehensive and systematic approach undertaken to identify and review studies. Careful review was undertaken to ensure that data were not counted twice where individual trials are described in multiple publications and that open-label extension studies were correctly linked to the original trials. Thus, the data extracted comprise a comprehensive repository of PRO data for patients with OAB, facilitating a broad overview as presented here, or the potential for a more focused review of specific instruments, study designs or OAB therapies. Broad study inclusion criteria encompassed observational studies and non-pharmacological clinical trials in addition to clinical trials of pharmacological treatments, although most identified studies were treatment trials. Novel data visualizations were developed to succinctly characterize baseline values and change scores for all scales and subscales of the OAB-q across multiple studies.

Limitations to the review include that numerous studies potentially of interest were excluded because of the requirement that change scores—or sufficient data to calculate change scores—for a PRO were not reported by authors. Another limitation is that the wide variety of instruments and study types included, and variation in reporting domains within instruments, presents a challenge for succinct synthesis of results. As such, the results presented here can be considered a master data set of published PRO results, from which details can be derived for more focused analyses. Finally, while several of the instruments have been validated and widely used, many instruments, including the most commonly used OAB-q, were developed prior to contemporary United States Food and Drug Administration (US FDA) guidance for PROs [81], and may not be reflective of current best practices. Thus, while use of the OAB-q in future studies would allow for the greatest breadth of contextualization within the existing literature, its applicability is limited by lack of confirmation of development in accordance with FDA recommendations.

Conclusion

Clinical trials assessing the efficacy of antimuscarinics, mirabegron and onabotulinumtoxinA for the treatment of OAB have observed improved HRQoL. The published evidence in treatment studies indicates that improvements in the OAB-q of at least the MID are observed over time. Where comparative data were available, active treatments tended to be associated with a greater improvement than placebo, and no differences were observed across active treatments (e.g. across alternative antimuscarinic therapies, or between mirabegron and an antimuscarinic). These trends are consistent with observed data for clinical outcomes, suggesting consistency between clinical outcomes and OAB-specific PRO measures. These findings provide benchmark values for OAB-q levels across the current published literature and can inform future clinical trial development to improve consistency of data collection, making for a more robust evidence base that facilitates quantitative cross-trial comparisons of the safety and efficacy of pharmacological OAB interventions.