Reliability of supraspinal correlates to lower urinary tract stimulation in healthy participants – A fMRI study

ABSTRACT Previous functional neuroimaging studies provided evidence for a specific supraspinal network involved in lower urinary tract (LUT) control. However, data on the reliability of blood oxygenation level‐dependent (BOLD) signal changes during LUT task‐related functional magnetic resonance imaging (fMRI) across separate measurements are lacking. Proof of the latter is crucial to evaluate whether fMRI can be used to assess supraspinal responses to LUT treatments. Therefore, we prospectively assessed task‐specific supraspinal responses from 20 healthy participants undergoing two fMRI measurements (test‐retest) within 5–8 weeks. The fMRI measurements, conducted in a 3T magnetic resonance (MR) scanner, comprised a block design of repetitive bladder filling and drainage using an automated MR‐compatible and MR‐synchronized infusion‐drainage device. Following transurethral catheterization and bladder pre‐filling with body warm saline until participants perceived a persistent desire to void (START condition), fMRI was recorded during repetitive blocks (each 15s) of INFUSION and WITHDRAWAL of 100mL body warm saline into respectively from the bladder. BOLD signal changes were calculated for INFUSION minus START. In addition to whole brain analysis, we assessed BOLD signal changes within multiple ‘a priori’ region of interest (ROI), i.e. brain areas known to be involved in the LUT control from previous literature. To evaluate reliability of the fMRI results between visits, we applied different types of analyses: coefficient of variation (CV), intraclass correlation coefficient (ICC), S&phgr;rensen‐Dice index, Bland‐Altman method, and block‐wise BOLD signal comparison. All participants completed the study without adverse events. The desire to void was rated significantly higher for INFUSION compared to START or WITHDRAWAL at both measurements without any effect of visit. At whole brain level, significant (p<0.05, cluster corrected, k≥41 voxels) BOLD signal changes were found for the contrast INFUSION compared to START in several brain areas. Overlap of activation maps from both measurements were observed in the orbitofrontal cortex, insula, ventrolateral prefrontal cortex (VLPFC), and inferior parietal lobe. The two highest ICCs, based on a ROI's mean beta weight, were 0.55 (right insular cortex) and 0.47 (VLPFC). Spatial congruency (S&phgr;rensen‐Dice index) of all voxels within each ROI between measurements was highest in the insular cortex (left 0.55, right 0.44). In addition, the mean beta weight of the right insula and right VLPFC demonstrated the lowest CV and narrowest Bland and Altman 95% limits of agreement. In conclusion, the right insula and right VLPFC were revealed as the two most reliable task‐specific ROIs using our automated, MR‐synchronized protocol. Achieving high reliability using a viscero‐sensory/interoceptive task such as repetitive bladder filling remains challenging and further endeavour is highly warranted to better understand which factors influence fMRI outcomes and finally to assess LUT treatment effects on the supraspinal level. Highligths:First multisession fMRI study on reliability of supraspinal bladder control.Reliability of supraspinal response is prerequisite to assess LUT treatment effects.Right insular cortex and VLPFC show most reliable response to bladder filling task.Automated, MR‐synchronized protocol facilitates reliable viscero‐sensory tasks.

Previous functional neuroimaging studies provided evidence for a specific supraspinal network involved in lower urinary tract (LUT) control. However, data on the reliability of blood oxygenation level-dependent (BOLD) signal changes during LUT task-related functional magnetic resonance imaging (fMRI) across separate measurements are lacking. Proof of the latter is crucial to evaluate whether fMRI can be used to assess supraspinal responses to LUT treatments.
Therefore, we prospectively assessed task-specific supraspinal responses from 20 healthy participants undergoing two fMRI measurements (test-retest) within 5-8 weeks. The fMRI measurements, conducted in a 3T magnetic resonance (MR) scanner, comprised a block design of repetitive bladder filling and drainage using an automated MRcompatible and MR-synchronized infusion-drainage device. Following transurethral catheterization and bladder pre-filling with body warm saline until participants perceived a persistent desire to void (START condition), fMRI was recorded during repetitive blocks (each 15 s) of INFUSION and WITHDRAWAL of 100 mL body warm saline into respectively from the bladder. BOLD signal changes were calculated for INFUSION minus START. In addition to whole brain analysis, we assessed BOLD signal changes within multiple 'a priori' region of interest (ROI), i.e. brain areas known to be involved in the LUT control from previous literature. To evaluate reliability of the fMRI results between visits, we applied different types of analyses: coefficient of variation (CV), intraclass correlation coefficient (ICC), Sørensen-Dice index, Bland-Altman method, and block-wise BOLD signal comparison.
All participants completed the study without adverse events. The desire to void was rated significantly higher for INFUSION compared to START or WITHDRAWAL at both measurements without any effect of visit. At whole brain level, significant (p < 0.05, cluster corrected, k ! 41 voxels) BOLD signal changes were found for the contrast INFUSION compared to START in several brain areas. Overlap of activation maps from both measurements were observed in the orbitofrontal cortex, insula, ventrolateral prefrontal cortex (VLPFC), and inferior parietal lobe. The two highest ICCs, based on a ROI's mean beta weight, were 0.55 (right insular cortex) and 0.47 (VLPFC). Spatial congruency (Sørensen-Dice index) of all voxels within each ROI between measurements was highest in the insular cortex (left 0.55, right 0.44). In addition, the mean beta weight of the right insula and right VLPFC demonstrated the lowest CV and narrowest Bland and Altman 95% limits of agreement.
In conclusion, the right insula and right VLPFC were revealed as the two most reliable task-specific ROIs using our automated, MR-synchronized protocol. Achieving high reliability using a viscero-sensory/interoceptive task such as repetitive bladder filling remains challenging and further endeavour is highly warranted to better understand which factors influence fMRI outcomes and finally to assess LUT treatment effects on the supraspinal level.

Introduction
The human lower urinary tract (LUT) has two important homeostatic functions: 1) low pressure continent and symptom-free storage of urine at adequate volumes, and 2) periodical, self-determined and complete release of the stored urine. To accomplish both, the LUT relies on a complex interplay of the central, peripheral and autonomic nervous system that is under supraspinal control. The latter, in particular, is relevant for the voluntary regulation of LUT function, which in turn is important for the appropriate adaption of LUT function to external circumstances, i.e., choosing safe and acceptable time point to empty the bladder.
Nowadays, fMRI has become the main neuroimaging modality to investigate supraspinal LUT control including post-treatment evaluation Blok et al., 2006;Kavia et al., 2010;Pontari et al., 2010). Despite previous studies investigating the reproducibility of fMRI findings using auditory (Wei et al., 2004;Caceres et al., 2009), memory (Caceres et al., 2009;Plichta et al., 2012), motor (Gountouna et al., 2010;Quiton et al., 2014) or pain tasks (Quiton et al., 2014), there is a paucity of reliability data in the context of interoceptive tasks, particularly in regard to the LUT. Yet, there is a single study providing data on this topic but only for same session repeat investigations (Clarkson et al., 2017).
However, information on separate session reliability is important for longitudinal studies and the evaluation of LUT treatment effects on supraspinal level. Therefore, our aim was to evaluate the reliability of supraspinal responses to bladder filling in healthy participants using our automated fMRI protocol (Jarrahi et al., 2016;Leitner et al., 2017). Considering high accuracy in timing and volume delivery using a standardized, automatic fMRI protocol, we hypothesized that our approach will generate reliable activations of literature-based supraspinal areas involved in LUT control.

Study design
This is a prospective single center neuroimaging study conducted at the University of Zürich, Zürich, Switzerland. The study comprises two MRI measurements at two separate visits with an interval of 5-8 weeks (Fig. 1).

Ethics
This study has been approved by the local ethics committee (Kantonale Ethikkommission Zürich, KEK-ZH-Nr. 2011-0346) and was performed in accordance to the World Medical Association Declaration of Helsinki (World Medical Association, 1964) and the guidelines of the Swiss Academy of Medical Sciences (2009). Furthermore, handling of all personal data strictly complied with the federal law of data protection in Switzerland (The Federal Authorities of the Swiss Confederation, 1992, Swiss Academy of Medical Sciences, 2009). This study has been registered at clinicaltrials.gov (http://www.clinicaltrials.gov/ct2/show/NC T01768910).

Participants
We recruited a cohort of 20 healthy participants within an age range of 18-55 years. All participants provided written informed consent prior to study inclusion.
Exclusion criteria were LUT symptoms (LUTS), urological or neurological pathology, current or recurrent urinary tract infection (UTI), hematuria, pregnancy, previous surgery for urological or neurological reasons, regular medication intake (except contraceptives), and incompatibility with MRI safety rules (e.g. ferromagnetic or electromagnetic implants). This was assessed on the basis of a complete medical history, vital signs, physical and neurological examinations (including examination of urogenital sensation, bulbocavernosus reflex, anal reflex, anal sphincter tone, and anal squeeze response), urine analysis, urodynamic investigation, Mini Mental State Examination (MMSE) (Folstein et al., 1975), the Hospital Anxiety and Depression Scale (HADS) (Zigmond and Snaith, 1983), and a 3-day bladder diary using predefined cut-offs (Table 1). The bladder diaries were completed over three consecutive days, recording the time points and volumes (mL) of drinking and micturition, as well as the number of incontinence episodes, pad usage, and pain levels associated with urine storage and/or micturition (0-10). Additionally, standardized urological questionnaires such as the International Consultation on Incontinence Questionnaire (ICIQ) for LUT Fig. 1. Sequence of study visits and procedures. symptoms in females (ICIQ-FLUTS) and males (ICIQ-MLUTS) (Jackson et al., 1996;Donovan et al., 2000), and the Overactive Bladder Questionnaire short form (OAB-q SF) (Coyne et al., 2015) were completed.

Pre-and post-measurement procedures
All participants were asked to abstain from any product containing caffeine or nicotine for at least six hours prior to each fMRI measurement visit. At the beginning of each fMRI measurement visit a urinalysis was performed to exclude urinary tract infection or pregnancy. Prior to entering the magnetic resonance (MR) scanner, all participants removed any ferromagnetic items and changed clothing to standardized clinical scrubs. All participants were trained to use a MR-compatible handheld response system (Jarrahi et al., 2013) in order to rate their desire to void on a numeric rating scale (NRS, ranging from 0 ¼ no desire to void at all to 10 ¼ strongest desire to void), which was presented on the video screen to every participant.
Prior to starting the fMRI paradigm, a 14 Fr Foley catheter (Uromed, Germany) was placed in the bladder. Then, the bladder was emptied and the balloon of the catheter was filled with 5 mL sodium chloride solution 0.9% (Braun, Germany) to avoid any unexpected dislocation. Subsequently, the bladder was manually prefilled until each participant reported a persistent desire to void, i.e. corresponding to a NRS score of 6. The individual prefilled bladder volume applied for the 1st fMRI measurement was also used for the 2nd fMRI measurement to ensure comparable measurement conditions. Before removing the Foley catheter at the end of each fMRI measurement, the bladder was emptied to record the corresponding urine volume. To screen for adverse events, each participant was interviewed via telephone or through email within one week and one month after the 2nd fMRI measurement visit (Fig. 1).

Acquisition of neuro-imaging data
Structural and functional MRI data were recorded on a Philips Ingenia 3 Tesla scanner (Philips Medical Systems, Best, the Netherlands) using a 15-channel Philips Sense head coil.

Anatomical imaging
Anatomical data were acquired with a high-resolution 3D T1weighted turbo field gradient echo (TFE) pulse sequence covering the entire cerebrum as well as the cervical spinal cord. The imaging parameters were: isotropic 1 mm 3 resolution, field of view 256 Â 256 Â 180 mm, matrix 256 Â 256 mm, 180 slices (scan time of 305 s), repetition time (TR) ¼ 6.92 ms, echo time (TE) ¼ 3.1 ms, slice thickness ¼ 1 mm, no inter-slice gap, and flip angle ¼ 8 .

Functional MRI paradigm
The fMRI paradigm ( Fig. 2) comprised eight blocks of repetitive filling (INFUSION condition) and draining (WITHDRAWAL condition) of 100 mL body warm saline. In order to precisely fill and drain the bladder, i.e. to provide exact volume delivery and timing, we used a custom-made automated MR-compatible and MR-synchronized infusion-drainage device (IDD) (Jarrahi et al., 2016;Leitner et al., 2017). The blocks of INFUSION (15 s) and WITHDRAWAL (15 s) were interspersed with PLATEAU (9s), REST (7-9 s, mean 8 s), and RATING (15 s) periods (Fig. 2). REST periods served as interlude to allow the blood oxygenation level-dependent (BOLD) signal to return to baseline, hence avoiding any overlap from motor activity from the rating into a new block of filling. During the RATING periods that followed each INFUSION and WITH-DRAWAL period, healthy participants rated their desire to void.

Functional imaging
Functional time series were acquired with a field echo, i.e. gradient echo, echo planar imaging (FEEPI) pulse sequence. Thirty-four axial slices covered the entire cerebrum in ascending order. Other scan parameters were as follows: TR ¼ 2000 ms, TE ¼ 16 ms, slice thickness ¼ 3 mm, inter-slice gap ¼ 1 mm, flip angle ¼ 80 , field of view ¼ 240 Â 135 Â 240 mm, and reconstructed image matrix 80 Â 80 voxels, voxel resolution: 3 Â 3 mm. All images were obtained in an oblique axial orientation covering the entire brain including the cerebellum and rostral brainstem (including the pons). The task comprised 445 dynamic scans (scan duration of 902 s). The first four scans were dummy scans to reach steady-state magnetization.

Statistical analyses of clinical and behavioural data
Statistical analyses of clinical and behavioural data were performed using IBM's Statistical Package for the Social Sciences (SPSS) version 25.0 (Armonk, New York, USA). Normal distribution of data and model residuals was tested using Shapiro-Wilk test and Q-Q plots. Depending on distribution, parametric or non-parametric tests were chosen accordingly. Bonferroni correction was used to account for multiple comparisons. The threshold for a significant difference was p < 0.05.
Descriptive statistics were used for clinical parameters. Results are presented as means and standard deviations (SD) for continuous, e.g. age, 3-day bladder diary, and questionnaires, and absolute numbers for binominal data, e.g. intact or impaired for results from the neurourological examination.
To test the effect of time and condition on the rating of the desire to void, linear mixed models were employed. First, the interaction effect between the condition (i.e. INFUSION, WITHDRAWAL, and START) and time-point (i.e. 1st and 2nd fMRI measurement) on the rating of the desire to void was assessed. Rating was set as the dependent variable, while condition and time-point were set as independent variable. Random subject effects were included in this model. In case nonsignificant interaction effect, main effects of condition and time-point were tested in separate models. Wilcoxon signed rank test was used for pair-wise comparison of postmeasurement urine volume between both fMRI measurements.

Preprocessing of neuro-imaging data
Functional and anatomical data were pre-processed and analyzed in MATrix LABoratory (MATLAB) version 2016b (The MathWorks, Inc., Natick, Massachusetts, USA) using Statistical Parametric Mapping (SPM) version 8 (Wellcome Trust Center for Neuroimaging, University College London, London, UK). First, T1-weighted images were reoriented to set the image origin to the anterior commissure (x ¼ 0, y ¼ 0, and z ¼ 0) within the Montreal Neurological Institute (MNI) standard space. The images were initially realigned to the first scan, and then unwarped to control for movements and susceptibility-induced image distortions and temporally smoothed. Following co-registration of the anatomical and functional images, spatial normalisation of the functional images was performed (Friston et al., 1995). Using a unified segmentation approach, individual brains were normalized to the MNI standard space. Finally, spatial smoothing was conducted by applying an isotropic 8-mm full-width-at-half-maximum Gaussian kernel to reduce image noise. Temporal band pass filtering (128 s) was additionally applied to each data set. Prior to data analysis, fMRI data from each participant was visually screened for movement artefacts (comparing translation and rotation values over time).

Analysis of neuro-imaging data
A voxel-wise general linear model (GLM) (Friston et al., 1994) was used for the first-level analysis (i.e., within-participant) in order to calculate contrast images for each condition separately (i.e., INFUSION minus START) and for each fMRI measurement. Movement parameters were used as regressor of no interest in the subsequent first-level GLM. Significant changes in BOLD signal were identified using a repeated boxcar model convolved with a canonical form of the hemodynamic response function (HRF) (Friston et al., 1994). The second-level analysis was performed to identify specific pattern of activation for the entire group. Results were corrected for age, sex and total intracranial volume (included as covariates of no-interest in the GLM) to account for the between participant variability. One sample t-test was conducted to reveal BOLD signal changes of the contrast INFUSION minus START for each time point (i.e., 1st and 2nd fMRI measurement). A paired t-test was used for pairwise comparison of each contrast between both time points (i.e., between both fMRI measurements). Results of whole brain analysis are shown at p < 0.05 (cluster-corrected, k ! 41 voxels) (Slotnick et al., 2003).
Two different types of statistic reflect the scaling of the measurement error. One type is the CV, where the error variance is scaled by the magnitude of activation. The CV (in %) was calculated across repeated measurement sessions using the following formula for each individual: where mean i denotes the mean of measurement i, SDi denotes the standard deviation of measurement i, n i denotes the number of trials at measurement i, and k denotes the total number of measurements. The other type of scaling of the measurement error is the ICC. First, we compared BOLD signal similarities between the 1st fMRI and 2nd fMRI across the whole brain at a voxel-to-voxel level. Then, we calculated reliability maps for the third ICC (ICC(3,1)) according to Shrout and Fleiss (1979) :   Fig. 2. Scan paradigm of the task-related functional magnetic resonance imaging (fMRI). Prior to the start of the task-related fMRI the bladder was manually prefilled via a Foley catheter until a persistent desire to void was present. The actual functional paradigm begins with the baseline condition START (60s, no specific stimulus or task is performed), followed by a baseline rating of desire to void, and a short REST condition randomly shifting between 7 and 9 s (jitter). Subsequently, eight repetitive blocks were applied comprising the following conditions: (1) automated INFUSION of 100 mL warm saline, (2) PLATEAU condition (full bladder at stable volume, no further infusion), (3) RATING of desire to void, (4) REST condition randomly shifting between 7 and 9 s (jitter), (5) automated WITHDRAWAL of 100 mL, (6) PLATEAU condition (bladder at stable volume, no further withdrawal), (7) RATING of desire to void, and (8) REST condition randomly shifting between 7 and 9 s (jitter).
This equation estimates the correlation of the participant signal intensities between sessions, modelled by a two-way ANOVA, with random participant effects and fixed session effects. In this model, the total sum of squares is split into participant (BMS), session (JMS) and error (EMS) sums of squares; and k is the number of repeated sessions. The main advantage of ICC(3,1) is that it assesses only the level of consistency between measurements (McGraw and Wong, 1996). ICC(3,1) is fully determined by the group effect (F¼BMS/EMS). Session effects (F ¼ JMS/EMS) and the group activation for the 1st session are considered as distinct factors. The latter, the group signal, is estimated by: where with Â1 is the 1st session data, and n is number of participants. The assessment of a possible relationship between group activation and reliability was obtained from the joint probability distribution f (ICC,t) (Caceres et al., 2009).
Subsequently, ICC was calculated (in SPSS) at ROI level using a two mixed-model to investigate the agreement between parameter estimates (beta weights) averaged across all scan blocks for the contrast "INFU-SION minus START" within each a priori ROI between the two scan time points t1 (1st fMRI) and t2 (2nd fMRI). The results are reported for the type absolute agreement [rater 1 (data from t1) "agrees" with rater 2 (data from t2), single measures] with a confidence interval of 95%.
In order to gain insight into the spatial congruence of activations at ROI level, the relative overlap of activations within each ROI between t1 and t2 was determined by calculating the Sørensen-Dice index for the 2nd-level group data from pairs of activation maps (from the two scan time points t1 and t2): where V overlap represents the number of voxels commonly activated in each ROI at t1 and t2, and V1 þ V2 represent the number of voxels that were activated at t1 or t2 respectively. The ratio of commonly activated voxels and the sum of V1þV2 was calculated from activation maps that were limited to p 0.001 (uncorrected for multiple comparisons). The Sørensen-Dice index can range from 0 (no overlap) to 1 (perfect overlap) and is independent of the height of the t-values, once voxels have passed the threshold.
In addition to the ICC, Bland and Altman 95% limits of agreement were chosen to visualise the agreement between the parameter estimates (beta weights) averaged across all scan blocks for the contrast "INFU-SION minus Start" within the selected ROIs (named S) between the two scan time points t1 and t2: Subsequently, a linear regression analysis for each ROI was performed using the difference in activation between t1 and t2 as dependent variable and the mean activation of t1 and t2 as independent variable. Any significant effect (one-sample t-test) would indicate a proportional bias, i.e. that the number of data points (¼ mean activation for a participant for a given ROI) are not equally distributed around the mean activation across all participants.
Finally, we extracted for each participant the BOLD signal time courses of both scan time points for each ROI and filling block. For this analysis, we used the pre-processed fMRI data (including realignment, unwarping, normalisation, spatial smoothing, and temporal band pass filtering (128 s)). A repeated-measures ANOVA (using the mean activity across all time points within each block) with the factors time and block was applied in order to test for main effects of scan time point and block as well as interaction effects. In case of a main effect of block, we used post-hoc paired t-tests comparing the differences of the fMRI signal between blocks within each scan time points. In case of a significant main effect of time or time Â block interaction, we would apply a paired t-test to compare changes of the pre-processed fMRI signal for each block between t1 and t2.

Participants
Twenty right-handed healthy participants, 10 women and 10 men, mean age 39 AE 12 years (range 22-54) were included in this study. As per our inclusion criteria, no pathologic findings with respect to 3-day bladder diary, questionnaires and neuro-urological examination were detected (Table 1). All participants completed the study. No adverse events were reported, neither during nor after the MR measurement itself or at follow-up.

Desire to void and post fMRI bladder volume
The prefilled bladder volume prior to each fMRI measurement was 453 AE 153 mL. The linear mixed model revealed a non-significant interaction effect (condition * time-point) on the rating of desire to void (F ¼ 2.9, df: 2, p ¼ 0.061). Moreover, a significant main effect of condition on the desire to void rating (F ¼ 126.1, df: 2, p < 0.001) was detected. In contrast, there was no significant effect of time-point (i.e. visits) on the desire to void rating (F ¼ 1.4, df: 1, p ¼ 0.237). Significant differences between all conditions were found for each fMRI measurement. INFUSION induced the highest rating of desire to void, (1st fMRI: 7.9 AE 1.5 and 2nd fMRI: 7.8 AE 1.1) while WITHDRAWAL resulted in the lowest rating (4.5 AE 1.6 and 3.4 AE 1.8). Baseline rating after bladder prefilling (START) was 6.3 AE 0.6 and 6.2 AE 0.5, respectively.

Supraspinal responses to automated, repetitive LUT stimulation
Head motion (translation) was below 1.5 mm (i.e. half of the recorded voxel resolution) for each participant and fMRI run; hence, all participants were included in this study. At whole brain level, significant BOLD signal changes to bladder filling (¼ contrast INFUSION minus START) were detected in several brain areas during both fMRI measurements ( Table 2). Results of the whole brain analysis are shown at p < 0.05 (cluster-corrected) (Slotnick et al., 2003).
An illustration of the brain areas from the 1st and 2nd fMRI measurement as listed in Table 2 and their spatial overlap is demonstrated in Fig. 3.

Reliability analysis
The whole brain ICC across both time points is shown in Fig. 4, some voxel locations showed an excellent ICC (for illustration purpose, we only report voxels with an ICC ! 0.75).
At ROI level, areas of significant (p ¼ 0.05, FWE corrected) BOLD response within each of the 12 selected ROIs in response to bladder filling are summarized in Table 3. Significant BOLD responses were detected in 11 of 12 ROIs during the 1st fMRI and in 8 of 12 ROIs during the 2nd fMRI. The following eight ROIs demonstrated significant BOLD responses during both visits (Table 3): DLPFC, OFC, VLPFC, ACC, left insula, right insula, hypothalamus, and left basal ganglia. The ICC analysis, based on the mean beta weight of these eight ROIs, revealed values between À0.33 (ACC) and 0.55 (right insula).
The relative overlap of activations between 1st and 2nd fMRI, calculated using the Sørensen-Dice index, showed the highest voxel overlap, i.e. the highest spatial congruence of activations, for the left (R overlap ¼ 0.55) and right (R overlap ¼ 0.44) insular cortex (Table 3).
The individual (per participant) CV between both fMRI measurements using the mean beta weights of each ROI that demonstrated significant BOLD responses in both fMRI measurements is displayed in Fig. 5. The CV ranged between À20.8% (basal ganglia left) and 26.3% (hypothalamus) ( Table 4). The smallest CV spread in conjunction with a low median CV can be identified for the right insula and VLPFC (Table 4).
The Bland and Altman 95% limits of agreement of the beta weights within each of those eight ROIs in relation to the scan time points are shown in Fig. 6. None of the eight ROIs showed a proportional bias (all p > 0.05).
BOLD signal responses for each of the eight INFUSION blocks are shown in Fig. 7. Repeated-measures ANOVA did not reveal a main effect of block (at t1 or t2), time, or a time*block interaction.

Discussion
The results of this study demonstrate that using our automated, MRsynchronized bladder filling protocol, we were able to elicit significant supraspinal responses during two separate fMRI measurements. 8 of 12 predefined ROIs showed significant BOLD responses during both fMRI measurements: DLPFC, OFC, VLPFC, ACC, left insula, right insula, hypothalamus, and left basal ganglia. The right insular cortex and the VLPFC are the supraspinal ROIs that show the most reliable response to our automated bladder filling task.
The essential role the brain plays in the control of the LUT is beyond controversy and neuroimaging studies have largely helped to visualise and better understand the supraspinal correlates of LUT control in physiological but also pathophysiological context (Blok, 2002;Holstege, 2005;Fowler et al., 2008). Despite this gain in knowledge, clinical practice has not much changed in this regard yet. To increase the impact Table 2 Whole brain (voxel-wise) analysis of supraspinal areas with significant (p < 0.05 (cluster-corrected, k ! 41 voxels)) BOLD signal changes during bladder filling (¼ contrast INFUSION minus START). The asterisk (*) indicates brain regions that show an overlap between both fMRI sessions (see Fig. 3). In this regard it has to be considered that the orbitofrontal cortex of the 1st fMRI is a large cluster of 3175 voxel that spreads widely beyond its peak activation, covering adjacent structures. Fig. 3. Group-level maps (axial slices) of brain areas with significantly (p < 0.05 (cluster-corrected)) increased BOLD signal during bladder filling (¼ contrast "INFUSION minus START") from 1st fMRI (red) and 2nd fMRI (green). Overlap between identified brain areas is depicted in yellow. fMRI functional magnetic resonance imaging, IPL inferior parietal lobe; OFC orbitofrontal/frontopolar prefrontal cortex; VLPFC ventrolateral prefrontal cortex.
of functional neuroimaging on clinical practice, it is crucial to be able to reliably assess LUT treatment effects at supraspinal level to better understand the mechanism of action and to evaluate efficacy, including reason for failure. Several neuroimaging studies have investigated whether treatment-related improvement in LUT control correlate with supraspinal responses Blok et al., 2006;Kavia et al., 2010;Pontari et al., 2010;Griffiths et al., 2015). However, it remains questionable whether task-specific brain activities, acquired with fMRI, are reliable and consequently to what extent a change in supraspinal response is treatment related.
To investigate for the first time the test-retest reliability of BOLD signal responses to a bladder filling task across two separate (5-8 weeks) fMRI measurements, we applied a block design and used an automated, MR-synchronized IDD (Jarrahi et al., 2016;Leitner et al., 2017) to achieve maximum consistency in regard to the task performance. The temporal distance between both fMRI measurements (visit 2 and 3), i.e. 5-8 Fig. 4. Results of the whole brain intraclass correlation coefficient (ICC) analysis between the two scan time points for the contrast "INFUSION minus START". Only regions with an ICC !0.75 are shown. Table 3 ROI specific BOLD signal response during bladder filling ("INFUSION minus START") for ROIs showing significant activation (p ¼ 0.05, FWE corrected). The ICC represent a reliability measure (values ranging from 0 ¼ no reliability to 1 ¼ perfect reliability) of a ROI's mean beta weight between 1st and 2nd fMRI. The relative voxel overlap (values ranging from 0 ¼ no overlap to 1 ¼ complete overlap) between 1st and 2nd fMRI, calculated using the Sørensen-Dice index, represents a measure of spatial congruency based on activation maps limited to p 0.001 (uncorrected for multiple comparisons). Thalamus 15 À25 7 n/a n/a n/a n/a n/a 8 PAG 12 À16 À8 n/a n/a n/a n/a n/a 9 Pons À9 À19 À32 n/a n/a n/a n/a n/a 10a Cerebellum (left) n/a n/a n/a n/a n/a n/a n/a n/a 10b Cerebellum (right) 30 À52 À47 n/a n/a n/a n/a n/a 11a Basal ganglia (left) À27 À15 Basal ganglia (right) n/a n/a n/a 30 11 10 n/a n/a 12 SMA n/a n/a n/a À9 14 61 n/a n/a ACC anterior cingulate cortex, BA Brodmann area, BOLD blood oxygenation level-dependent, DLPFC dorsolateral prefrontal cortex, fMRI functional magnetic resonance imaging, FWE family-wise error, ICC intraclass correlation coefficent, MNI Montreal Neurological Institute, OFC orbitofrontal/frontopolar prefrontal cortex, PAG periaqueductal grey, ROI region of interest, SMA supplementary motor area, VLPFC ventrolateral prefrontal cortex.
weeks, is in line with the time window to expect maximum treatment efficacy when treating individuals with LUT dysfunction (Siami et al., 2002;Karsenty et al., 2008) and to perform follow-up assessments in clinical trials (Maman et al., 2014). In addition, it helped to accommodate female participants to be investigated at the same time point of their menstrual cycle.
With this set-up, it was possible to demonstrate significant taskrelated BOLD signal responses in both fMRI visits at p < 0.05 (corrected, using a cluster-threshold correction of k > 41 voxel) (Slotnick et al., 2003). The intensity level in response to LUT stimulation either exceeded or was similar to what has been previously reported (Griffiths et al., 2005(Griffiths et al., , 2007Zhang et al., 2005;Di Gangi Herms et al., 2006;Tadic et al., 2008Tadic et al., , 2010Pontari et al., 2010).
Despite that MNI peak coordinates of each fMRI measurement (Table 2) did not highly correlate between measurements (except left inferior parietal lobe, BA 39), Fig. 3 demonstrates regional consistency and overlap of significant clusters located in the right OFC, right insula, right VLPFC, and right inferior parietal lobe (Fig. 3).
Using the voxel-based whole brain ICC analysis to quantify reliability of voxels between measurements (Fig. 4), a large number of voxels in brain areas attributed to the SSN of LUT control showed excellent reliability (ICC > 0.75). This would be an ideal result. However, this result is somewhat conflicting with our findings shown in Table 2. Moreover, one has to consider that voxel-based whole brain ICC analysis does not reflect the BOLD signal amplitude. Hence, as reported previously (Caceres et al., 2009;Plichta et al., 2012), there are voxels that demonstrate high reliability but low signal intensity (i.e. small t-values). Such voxels could be a task-related response but only indirectly or non-linearly to the stimulus (Caceres et al., 2009;Plichta et al., 2012).
To assess both, reliability of task-related signal validity and spatial allocation, 12 ROIs were selected based on the most frequently reported supraspinal areas involved in LUT control (Kavia et al., 2005;Griffiths and Tadic, 2008;Fowler and Griffiths, 2010;Arya et al., 2017).
The current working models on suprapinal LUT control based on previous neuroimaging findings suggest that the ascending signals form the LUT are relayed through the PAG and thalamus to cortical areas, which further process the information and are involved in appropriate decision making and action taking processes (Fowler et al., 2008;Griffiths and Tadic, 2008;Fowler and Griffiths, 2010). These cortical areas specifically comprise the insula, the PFC, and the ACC. The insula as primary interoceptive cortex is supposed to be involved in a first-order mapping of visceral signals (e.g. bladder distention) with second-order regions such as the PFC and the ACC which are involved in the subjective awareness and feeling of the visceral signal (e.g. desire to void) (Craig, 2002(Craig, , 2009Fowler et al., 2008). The PFC is involved in planning complex cognitive and appropriate social behaviour (e.g. handling of strong desire to void in different circumstances) as well as attention and response-selection mechanisms (e.g. postponement of micturition) which are essential for the volitional control of the LUT (Pardo et al., 1991;Bechara et al., 2000;Kavia et al., 2005;Fowler and Griffiths, 2010). The PFC has strong and multiple connections to the cingulate cortex and hypothalamus which both are part of the limbic system and considered to provide emotional and motivational input to the prefrontal decision making process. The ACC for example is thought to modulate how much attention is provided to signals arising from the bladder and how one reacts to them (Fowler et al., 2008). The hypothalamus, which is also associated with autonomic control due to its projections to all autonomic preganglionic motor neurons in the spinal cord, including the sacral parasympathetic and sphincter motor nuclei, is one of the few regions that has direct afferent projections to the pontine micturition center (Fowler et al., 2008;Griffiths and Tadic, 2008). This connection has been interpreted as an additional "layer of control" that permits micturition in healthy subjects only if it is judged 'safe' to do so.
The SMA has been mainly observed in association with pelvic floor muscle contractions and may serve as additional continence mechanism in situations of strong desire to void or urgency to prevent premature leakage (Zhang et al., 2005;Griffiths et al., 2015).
Although previous neuroimaging studies have reported cerebellar and basal ganglia activity in response to different LUT conditions, their role in LUT control is not yet clear (Zhang et al., 2005;Seseke et al., 2006). Both structures are known to be involved in motor activity but also in other functions such as cognition, attention, emotion, and behaviour (Wolf et al., 2009;D'Angelo, 2018;Florio et al., 2018). Lesions or disorders of basal ganglia and cerebellum seem to frequently result in LUT dysfunction, predominantly detrusor overactivity and detrusor-sphincter-dyssynergia (Dietrichs and Haines, 2002;Sakakibara et al., 2004Sakakibara et al., , 2008Winge et al., 2006;Chou et al., 2013). Hence, both structures seem to contribute facilitatory but also inhibitory input to LUT control and may be involved in level setting and fine tuning processes of Fig. 5. Box plots (first quartile, median, third quartile, and mean marker (cross)) with whiskers (1.5Â interquartile range of upper and lower quartile) and outlier of the individual coefficient of variation (CV) between visits of the mean beta weights of each ROI with significant BOLD response in both visits (Table 3). ACC anterior cingulate cortex, DLPFC dorsolateral prefrontal cortex, OFC orbitofrontal/frontopolar prefrontal cortex, VLPFC ventrolateral prefrontal cortex. LUT control.
Despite that a high proportion of selected ROIs demonstrated significant BOLD signal changes to our current bladder filling task, it was not expected to find significant results in all 12 ROIs. The latter is aided by the fact that our ROI selection is based on combined findings from different studies, none of which reported significant responses in all ROIs within a single investigation. This is, at least partially, due to the high diversity of outcome measures, study design, bladder filling and applied scan protocol among functional neuroimaging studies investigating supraspinal LUT control.
In this study, we could demonstrate that 7 of 12 selected ROIs, respectively 8 of 15 (if bilateral ROIs are counted separately) selected ROIs showed significant BOLD signal changes in response to bladder filling during both fMRI visits. These 8 ROIs represent essential parts of the interoceptive network, i.e. hypothalamus, insula, ACC, and OFC (Craig, 2002). This fits well with the interoceptive character and transmission of the bladder filling sensation. Furthermore, it is in line with previous reports on the supraspinal correlate of bladder filling sensations in healthy participants at larger bladder volumes, i.e. at a strong desire to void (Griffiths et al., 2007;Mehnert et al., 2008). However, reliability of the mean beta weight of each ROI was "only" poor to fair with the two highest ICC for right insula (0.55) and right VLPFC (0.47).
Recently, one study investigated the reliability of BOLD responses to bladder filling during same session repeat measurements representing the agreement between the first and second sequence of four blocks of total eight blocks (Clarkson et al., 2017). Reliability (ICC) was computed for three preselected ROIs (right insula, dorsal ACC, and medial PFC). Despite being recorded within the same session and without significant systematic change between first and second sequence of blocks, poor (0.19) to fair (0.44) reliability was achieved (Clarkson et al., 2017). These findings were interpreted as being related to random variability due to the complex and emotionally affected process of supraspinal LUT control (Clarkson et al., 2017). The authors concluded that there is need for improvements of methods used to provoke supraspinal responses related to bladder filling, i.e. standardizing the circumstances of the scanning, redesigning the stimulus protocol to yield larger brain responses (Clarkson et al., 2017).
Some of these suggested improvements have been already included in our study design, which may have contributed to the higher ICC values compared to Clarkson et al. even across a longer retest interval which usually tends to lower reliability (Bennett and Miller, 2010). However, direct comparison of our results to the study of Clarkson et al. has to be Fig. 6. Bland-Altman plots demonstrating the 95% limits of agreement between 1st and 2nd fMRI. The plots display the beta weight differences of visits against the mean beta weight for each of the 8 ROIs that demonstrated significant BOLD signal increase during both visits (Table 3). The middle line represents the mean. The upper and lower line represent the upper and lower limits of agreement (mean AE 1.96*SD), respectively. ACC anterior cingulate cortex, DLPFC dorsolateral prefrontal cortex, OFC orbitofrontal/frontopolar prefrontal cortex, VLPFC ventrolateral prefrontal cortex. done with caution due to the different study population (women with urgency incontinence vs healthy participants) and the age difference (!60 years vs 39.2 AE 11.6 years). Specifically, the latter can affect brain hemodynamics and consequently BOLD responses (D'Esposito et al., 2003). In addition, individuals with urinary incontinence may show a greater heterogeneity and variability of neuroimaging outcomes, depending on various factors such as the aetiology of incontinence, concomitant medications and the presence of a potential underlying neurological disease. In the present study, using an automated, MR-synchronized IDD helped to provide consistency and accuracy of task performance throughout the measurements (Jarrahi et al., 2016;Leitner et al., 2017). The applied protocol (Fig. 2) incorporated a randomly Fig. 7. Absolute INFUSION-related within-block blood oxygenation level-dependent (BOLD) signal variations relative to the mean BOLD activity across all blocks (¼ mean of all blocksindividual block) for each ROI and session (1st fMRIblack; 2nd fMRIgrey). The repeated measures ANOVA demonstrated no main effect of block or visit. ACC anterior cingulate cortex, DLPF dorsolateral prefrontal cortex, OFC orbitofrontal/frontopolar prefrontal cortex, VLPFC ventrolateral prefrontal cortex.
shifting time interval (jitter) of 7 -9s, (mean 8s) during each REST condition to avoid conditioning/habituation. In contrast to previous studies, using a larger INFUSION/WITHDRAWAL volume (i.e.100 mL) resulted in a strong and distinctive stimulus which is reflected by the participant's rating and the time course of the BOLD signal (Fig. 7), indicating a consistent stimulation and BOLD amplitudes within and between the two visits/fMRI measurements without any main effect of block or visit in the repeated-measures ANOVA.
Whether an ICC of 0.55 (as demonstrated for the right insula ROI) is sufficient enough to compare supraspinal LUT control before and after treatment has to be further elucidated. Considering the ICC classification of Cicchetti et al. (Cicchetti, 1994) our findings do not exceed the level of fair reliability which may be insufficient to differentiate true treatment effects from random variability. On the other side, our results are well within the range of previous investigations on fMRI reliability reporting ROI based ICC values from 0.16 to 0.88 with an overall mean of 0.52 (Bennett and Miller, 2010). In review of these studies, simpler motor and sensory tasks seems to be more reliable than more complex tasks involving interoception. This is not surprising, considering that instead of one distinct principle area such as primary motor, visual, or auditory cortex for simple motor, visual, or auditory tasks, respectively, more complex tasks require multiregional involvement, e.g. the SSN in case of LUT control (Fowler et al., 2008). Such network may undergo natural changes over time, be more variable between participants, and be more likely influenced/altered by the actual motivational and affective condition of the participants during measurements (Stern et al., 2017). This may at least partly explain the poor reliability of ACC activity, despite significant activity in the ACC ROI during both fMRI measurements (Table 3). Although we did not expect or discover findings that suggest habituation to our task over the time period between both fMRI measurements, it has to be considered that despite all pre-scan instructions and accommodation time in the scanner, the 1st fMRI measurement was an emotionally "newer" experience than the 2nd fMRI measurement which may have affected our results. To better control for this effect it may be necessary to include a 3rd fMRI measurement at the beginning as accommodative scan which is not included in the reliability analysis. However, this may be extremely challenging in regard to participant/patient compliance, scanner time availability, and budget.
Assessment of supraspinal LUT control is certainly a more complex task including not only the ability to be aware of sensation but also how sensations are interpreted, regulated, and used to inform behaviour, with different dimensions relating to different aspects of health and disease (Stern et al., 2017). These processes involve several brain areas including areas of interoceptive, emotional, and cognitive processing which may also overlap in function (Fowler et al., 2008). Despite that there is an expectable variability within but also between participants (Fig. 5) in several areas of the supraspinal LUT control network, there are areas with a smaller variability than others, such as the right insula and the VLPFC (Table 4, Fig. 5). This corresponds well to the narrower limits of agreement in the Bland-Altman plots of those ROIs and to their ICC values, which were the highest among the selected ROIs. In addition, the right insula also demonstrated a considerable spatial congruency between both fMRI measurements (Sørensen-Dice index of 0.44 ¼ 44% voxel overlap within ROI, Table 3). Other ROIs, such as the left insula or hypothalamus, demonstrated even higher values of spatial congruency, although their mean beta weight ICCs were even negative (Table 3). Again, this discrepancy is related to different thresholds and foci of both analyses (ICC vs Sørensen-Dice index). Voxels may have a task-related high spatial congruency but the agreement between the parameter estimates (beta weights) of these voxels may be poor due to their indirect or non-linear relation to the stimulus (Caceres et al., 2009;Plichta et al., 2012). Hence, of all investigated ROIs the right insular cortex appears to have the best congruency in both, BOLD signal location and validity.
In regard to the eminent role that has been attributed to the right insula in the function of interoception (Craig, 2002(Craig, , 2009Jarrahi et al., 2015), it may be only consequential that this area is most consistently involved in a LUT stimulation task. Although we could demonstrate a fair reliability specifically for the right insula, there is room for further improvement and results need to be confirmed in individuals with LUTD. However, to improve fMRI protocols investigating task-specific responses of the SSN involved in LUT control, we need to better understand the factors that influence reliability in our current task designs. In general, increasing the repetition time (i.e., to increase the sampling size) and/or field strength (to achieve a better signal-to-noise ratio) could help to improve the reliability but only with an adequate scanner maintenance/quality control and task design (Bennett and Miller, 2010).
In conclusion, achieving high reliability using a viscero-sensory/ interoceptive task such as repetitive bladder filling is challenging. Applying an automated, MR-synchronized task with a stronger stimulus ( AE100 mL INFUSION/WITHDRAWAL) than previously used, we found in a cohort of healthy participants that the right insula and right VLPFC are the most reliable brain regions in relation to our protocol. This finding suggests that these two areas should be specifically considered and evaluated when using fMRI to investigate treatment efficacy. Additional endeavour is highly warranted to better understand involved supraspinal and potentially also spinal processes.