Test-retest reproducibility of quantitative binding measures of [11C]Ro15-4513, a PET ligand for GABAA receptors containing alpha5 subunits

Introduction Alteration of γ-aminobutyric acid “A” (GABAA) receptor-mediated neurotransmission has been associated with various neurological and psychiatric disorders. [11C]Ro15-4513 is a PET ligand with high affinity for α5-subunit-containing GABAA receptors, which are highly expressed in limbic regions of the human brain (Sur et al., 1998). We quantified the test-retest reproducibility of measures of [11C]Ro15-4513 binding derived from six different quantification methods (12 variants). Methods Five healthy males (median age 40 years, range 38–49 years) had a 90-min PET scan on two occasions (median interval 12 days, range 11–30 days), after injection of a median dose of 441 MegaBequerels of [11C]Ro15-4513. Metabolite-corrected arterial plasma input functions (parent plasma input functions, ppIFs) were generated for all scans. We quantified regional binding using six methods (12 variants), some of which were region-based (applied to the average time-activity curve within a region) and others were voxel-based: 1) Models requiring arterial ppIFs – regional reversible compartmental models with one and two tissue compartments (2kbv and 4kbv); 2) Regional and voxelwise Logan’s graphical analyses (Logan et al., 1990), which required arterial ppIFs; 3) Model-free regional and voxelwise (exponential) spectral analyses (SA; (Cunningham and Jones, 1993)), which also required arterial ppIFs; 4) methods not requiring arterial ppIFs – voxelwise standardised uptake values (Kenney et al., 1941), and regional and voxelwise simplified reference tissue models (SRTM/SRTM2) using brainstem or alternatively cerebellum as pseudo-reference regions (Lammertsma and Hume, 1996; Gunn et al., 1997). To compare the variants, we sampled the mean values of the outcome parameters within six bilateral, non-reference grey matter regions-of-interest. Reliability was quantified in terms of median absolute percentage test-retest differences (MA-TDs; preferentially low) and between-subject coefficient of variation (BS-CV, preferentially high), both compounded by the intraclass correlation coefficient (ICC). These measures were compared between variants, with particular interest in the hippocampus. Results Two of the six methods (5/12 variants) yielded reproducible data (i.e. MA-TD <10%): regional SRTMs and voxelwise SRTM2s, both using either the brainstem or the cerebellum; and voxelwise SA. However, the SRTMs using the brainstem yielded a lower median BS-CV (7% for regional, 7% voxelwise) than the other variants (8–11%), resulting in lower ICCs. The median ICCs across six regions were 0.89 (interquartile range 0.75–0.90) for voxelwise SA, 0.71 (0.64–0.84) for regional SRTM-cerebellum and 0.83 (0.70–0.86) for voxelwise SRTM-cerebellum. The ICCs for the hippocampus were 0.89 for voxelwise SA, 0.95 for regional SRTM-cerebellum and 0.93 for voxelwise SRTM-cerebellum. Conclusion Quantification of [11C]Ro15-4513 binding shows very good to excellent reproducibility with SRTM and with voxelwise SA which, however, requires an arterial ppIF. Quantification in the α5 subunit-rich hippocampus is particularly reliable. The very low expression of the α5 in the cerebellum (Fritschy and Mohler, 1995; Veronese et al., 2016) and the substantial α1 subunit density in this region may hamper the application of reference tissue methods.

Approximately 5% of GABA A receptors contain the α5 subunit. The hippocampus is the structure with the highest concentration of α5subunit-containing receptors in the human brain; for example, an ex vivo study with the α5-subunit-selective radioligand [ 3 H]L-655,708 suggested they are present in almost 28% of GABA A receptors in this region (Sur et al., 1998). Here, they have a predominantly extrasynaptic localisation (Brunig et al., 2002) and mediate tonic inhibitory currents (Caraiscos et al., 2004). Experiments in animals suggest that agonists at receptors containing the α5 subunit negatively influence hippocampus-dependent learning and memory (Collinson et al., 2002;Crestani et al., 2002;Sternfeld et al., 2004;Yee et al., 2004;Cheng et al., 2006;Dawson et al., 2006). Whilst data in humans is lacking, the amnestic effect of alcohol on wordlist learning was reduced by pretreatment with an α5-subunit-selective inverse agonist (Nutt et al., 2007).
In healthy human volunteers, pre-scan administration of the α1 antagonist zolpidem did not significantly decrease total [ 11 C]Ro15-4513 volume-of-distribution (V T ; ) , but alteration of the fast (α1) component peaks (derived using exponential spectral analysis; SA) was observed. In the α5-rich hippocampus, the mean volume-of-distribution attributable to the fast peaks was reduced by approximately 70% to 0.44, whereas a slower component (presumably attributable to the a5 subunit) was reduced by approximately 13% to 10.00. More recently, human heterologous competition data acquired from nine healthy males using the α5-subunit-selective negative allosteric modulator, Basmisanil (RG1662), suggested that α5-specific binding accounts for 76% of the specific binding in the hippocampus (Myers et al., 2016). Inoue et al. (1992) used the simplified reference tissue model (SRTM) with pons as a (pseudo-) reference region to demonstrate the differences between the distribution of [ 11 C]Ro15-4513 and [ 11 C] flumazenil in five healthy participants. [ 11 C]Ro15-4513 V T has also been previously quantified by voxel-wise exponential SA in three healthy males (Lingford-Hughes et al., 2002). Comparison with [ 11 C] flumazenil PET derived from six healthy males indicated significantly greater V T (by approximately 36%) in the hippocampus for [ 11 C]Ro15-4513, with significantly less binding (approximately 43%) in the cerebellum.
[ 11 C]Ro15-4513 binding has also been quantified in eight healthy men using both compartmental modelling and linear graphical analyses (Asai et al., 2009); the SRTM with pons as a reference was recommended based on resilience to noise, but testretest studies were not performed.
[ 11 C]Ro15-4513 PET has recently been used to demonstrate alterations in GABA A α5 subunit binding in alcohol dependence , schizophrenia (Asai et al., 2008), in response to levodopa challenge (Lou et al., 2016), and in preliminary studies in temporal lobe epilepsy (Barros et al., 2010a) and autism spectrum disorder (Mendez et al., 2013). To facilitate interpretation, it is necessary to document the reproducibility and reliability of quantitative methods, both regional and voxelwise, for [ 11 C]Ro15-4513 PET. Whilst the testretest variability of presumed α1and α5-subunitspecific V T has been reported (Stokes et al., 2014), the variability of total V T has not. Moreover, the analysis performed by Stokes and colleagues was restricted to a single quantification method (SA), whereas variants that do not require an arterial input function, and variants which yield parametric images, still merit investigation. However, no such analysis has been published.
In the present study we quantified the test-retest reproducibility and reliability of measures of [ 11 C]Ro15-4513 binding derived from six quantification methods, namely standardised uptake values (SUVs), one tissue compartment and two tissue compartment models (2kbv and 4kbv), graphical (linear) analyses, SA, and SRTMs, in regions representative of a variety of α5 subunit concentrations, in five healthy volunteers.

Participants
The study was approved by the London -Riverside National Health Service (NHS) Research Ethics Committee, Imperial College Healthcare NHS Trust, and University College London Hospitals NHS Foundation Trust. Permission to administer [ 11 C]Ro15-4513 was obtained from the Administration of Radioactive Substances Advisory Committee, UK. All participants provided written, informed consent according to the Declaration of Helsinki prior to participation in the study.
Seven healthy male participants were recruited and provided written, informed consent. The exclusion criteria were: a history of either neurological or psychiatric condition(s), claustrophobia, any contraindication for undergoing magnetic resonance (MR) imaging, a positive urine drug test, general practitioner's (family doctor's) advice against participation, regular medication(s) (especially benzodiazepines), a history of substance abuse (especially benzodiazepines), and a pathological modified Allen's test for patency of the ulnar artery (Allen, 1929;Slogoff et al., 1983;Cable et al., 1999). Two of the seven participants were subsequently excluded: one who withdrew from the study before the second scan, and another in whom the arterial line could not be kept patent for the entire study. Hence, a total of five healthy male participants (median age 40 years, range 38-49 years; Table 1) were scanned twice. All participants underwent a urine drug cassette test for 11-nor-Δ 9 -tetrahydrocannabinol, morphine, amphetamine, benzoylecgonine (the main metabolite of cocaine), methamphetamine and oxazepam (Monitect; BMC, California, USA) on the same day as each PET scan.

Radiochemistry
[ 11 C]Ro15-4513 was produced on site by Hammersmith Imanet as described previously . Details of the injectate are provided in Table 1. Specific radioactivities at the time of injection were calculated in relation to the relative molecular weight of Ro15-4513 (326 mol/g).
The head position was maintained throughout and monitored with the camera's positioning laser. To compensate for minor head movements during the dynamic scans, we used a post hoc frame-by-frame realignment method, as described later ("PET data quantification" section). Sixty-three transaxial images were acquired per frame. Data were reconstructed using Fourier rebinning (FORE; (Defrise et al., 1997)) and 2D filtered backprojection (FBP: ramp filter, kernel 2.0 millimetres (mm) Full Width at Half Maximum (FWHM)). The voxel size of reconstructed images was 2.092 mm×2.092 mm×2.42 mm and the axial (on-axis) resolution 5.6 mm (Spinks et al., 2000).

Input function derivation
As in previous studies, continuous and intermittent blood samples were collected to allow the subsequent generation of arterial parent plasma input functions (ppIFs; Riaño Barros et al., 2014)). During the first 15 min, blood was withdrawn continuously at a rate of 300 millilitres (ml) per hour and radioactivity detected in a bismuth germanium oxide detection system (Ranicar et al., 1991). Intermittent discrete (10 ml) samples were taken using heparinised syringes before the scan (baseline) and at the following time points after the scan start: 4,6,8,10,20,35,50,65,80 and 90 min. These discrete samples were used to quantify plasma and whole blood radioactivity via centrifugation, as well as to allow quantification of the parent fraction of the radiotracer via high-performance liquid chromatography. The plasma-over-blood ratio model and the metabolite model were fit for each scan, so that the whole blood measurements between 0 and 15 min could be corrected for plasma-over-blood ratio and parent fraction, respectively. The plasma-over-blood ratio model used was as follows: (1) [where t s is the time from scan start in hours and x 1 to x 4 are the model parameters]. The same sigmoidal model function was used to describe the fraction of parent [ 11 C]Ro15-4513 remaining in plasma, as in (Myers et al., 2016). Continuous ppIFs were derived using Clickfit in-house software running in MATLAB R2014a (The MathWorks, Natick, MA, USA), as described in detail in previous studies (Hammers et al., 2007b;Hinz et al., 2007): 1) Cross-calibration of continuous and discrete whole blood radioactivity concentration measurements (4, 6, 8, 10 min). 2) Multiplication of the cross-calibrated whole blood measurements (0-15 min) by the sigmoid function obtained from fitting the model to the plasma-over blood ratio. 3) Combination of the resultant continuous plasma radioactivity concentration curve (0-15 min) with the discrete plasma radioactivity concentration measurements (20, 35, 50, 65, 80 and 90 min) using spline interpolation. 4) Correction for parent radiotracer fraction by multiplication of the resultant continuous plasma radioactivity concentration curve (0-90 min) by the sigmoid function obtained from fitting the model to the parent fraction.
2.5. MRI data acquisition, analysis and generation of regions-ofinterest (ROIs) All participants had 3D T1-weighted MRI scans with approximately millimetric isotropic voxel sizes on a Philips Intera 3 Tesla (3 T) MRI scanner (Philips, Best, The Netherlands) at the Robert Steiner MRI Unit, Hammersmith Hospital, for use during co-registration and region-of-interest (ROI) delineation. There was no gross structural abnormality on any of the T1-weighted images.
To isolate the grey matter in each brain, the T1-weighted images were segmented into tissue classes using SPM8 software (Statistical Parametric Mapping, Wellcome Trust Centre for Neuroimaging, University College London, London, www.fil.ion.ucl.ac.uk/spm) running in MATLAB R2014a. This process yielded grey matter probability maps for each participant, which were thresholded at 0.5 probability (an arbitrary value which was selected as a trade-off between over-exclusion of grey matter and over-inclusion of white matter).
To delineate the ROIs in each brain, the T1-weighted images were also anatomically segmented using MAPER (multi-atlas propagation with enhanced registration (Heckemann et al., 2010)). Using highdimensional image registration, 30 MRI data sets, each associated with manually determined labels of 83 regions (Hammers et al., 2003;Gousias et al., 2008), were propagated to the target brain. Label fusion was used to obtain an image which consisted of 83 labelled ROIs in target space (Heckemann et al., 2006).
Each participant's T1-weighted image and corresponding MAPERderived individual anatomical segmentations, as well as individual grey matter probability images, were co-registered with the corresponding processed PET summation image for test and retest scans separately, using SPM8. For the cortical ROIs, the individual atlases in PET space were then multiplied with the thresholded grey matter probability maps using Analyze 8.1 imaging software (Mayo Clinic 2002). The output grey-matter-masked, labelled ROI images were then used to sample the dynamic or parametric PET images.

ROIs
We evaluated testretest reliability of the quantification methods (12 variants) in six bilateral ROIs in total. Based on the expected concentrations of GABA A receptors containing α5 subunits, we selected ROIs to cover a range of binding: high-concentration limbic regionsanterior cingulate gyrus and hippocampus; intermediate concentration regionsfusiform (occipitotemporal) gyrus, inferior frontal gyrus and insula; and a low concentration regionthe occipital lobes. The (entire) brainstem and alternatively the grey matter of the cerebellum were used as pseudo-reference regions for the SRTMs. Here, the term "pseudo-reference" region is used as neither the brainstem nor the cerebellum are entirely devoid of α5 subunit specific binding. The methods not relying on a reference region as input were also applied to these regions, but comparison between variants was limited to the six non-reference ROIs only. Regional time-activity curves (TACs) were created by calculation of the mean radioactivity concentration over all grey-matter masked voxels of both left and right hemisphere homologues (excluding the brainstem which is not paired and was not greymatter masked), for each frame. Likewise, where parametric images were used, the mean was calculated over all voxels of both homologues (again excluding the brainstem).

PET data quantification
Where required (one of 10 datasets, based on a maximum translation > 5 mm as estimated using SPM "Realign" function), the dynamic PET images were de-noised and corrected for movement frame-byframe using wavelets in Piwave 8.0 (Studholme et al., 1997;Turkheimer et al., 1999) running in MATLAB R2014a. The frame starting at 150 s (frame 6) was used as reference due to its high signalto-noise ratio and the likelihood that the participant had remained still during the first three minutes of the scan. The frames 1-5 were not corrected due to their low signal to noise ratio. The remaining frames (6-24) were automatically resliced and re-concatenated with frames 1-5 into a new dynamic image (Hammers et al., 2007b).
Summation images (also known as 'add' or 'static' images) were created for frames 1-24, frames 16-21, and frames 22-24 with correction for 11 C radiodecay using MICKPM (Modelling, Input functions, and Compartmental Kinetics -Parametric Maps) version 5.4 software (available on request from Rainer Hinz, Wolfson Molecular Imaging Centre, University of Manchester, Manchester, UK), which itself uses MATLAB R2009bSP1. The summation images were required for calculation of global radioactivity concentrations, for use as the reference image during co-registration of the T1-weighted MRI data, and as the input for calculation of SUV images. A binary mask of the brain, which encompassed approximately 9 mm beyond the outer cortical boundary, was also created semi-automatically using Analyze 8.1 (Mayo Clinic, Rochester, New York) for use during the subsequent generation of parametric images.
Quantification of binding was performed as described below, using the same ROIs for each variant. We used a 64-bit PC (Intel Core i5 5300U CPU 2.30 GHz, 8.0 GB RAM) with Windows 7 Enterprise operating system. Variants in which parameters were calculated directly from the ROI TAC data (generated as described in Section 2.6) are henceforth referred to as "regional". Variants in which parameters were calculated on a voxel-by-voxel basis in order to generate parametric images are referred to as "voxelwise"; in this case the parametric image itself was sampled (mean, standard deviation) within same ROIs as used for regional variants. The quantification methods were: 1. Compartmental models, requiring arterial ppIFs (a) Reversible two-compartment (one tissue compartment) model with two rate constants and variable blood volume (2kbv) (b) Reversible three-compartment (two tissue compartment) model with four rate constants and variable blood volume (4kbv) 2. Graphical analyses, requiring arterial ppIFs (a) Regional Logan's graphical analysis with arterial ppIF (b) Voxelwise Logan's graphical analysis with arterial ppIF 3. Model-free analyses, requiring arterial ppIFs (a) Regional "classic" (non-regularised) SA (b) Voxelwise "classic" SA 4. Methods not requiring arterial ppIFs (a) Voxelwise standardised uptake values (SUVs) (i) 30.5-60.5 min (ii) 60.5-90.5 min (b) Regional SRTM using brainstem (c) Voxelwise SRTM2 using brainstem (d) Regional SRTM using cerebellum (e) Voxelwise SRTM2 using cerebellum

Weighting of ROI and voxel TACs
For each participant, all ROI/voxel TACs (not yet corrected for decay-correction) were weighted by the same values which were calculated according to: for frame (i = 1, 2, 3, …24; non − decay corrected data) [where W iweight for frame i; L ilength of frame i (seconds); T itotal of true coincidences (per second) for frame i]. For ROI TACs, the weights were normalised to sum(weights)=24 (i.e. number of frames), thresholded to max(weight)≤2.5, and then renormalised to sum(weights)=24. For voxel TACs, the weights were not normalised but were thresholded to max(weight)/min(weight)≤1000, according to the RPM (Receptor Parametric Mapping software) scheme (Gunn et al., 1998;Aston et al., 2001).
2.9. Reversible compartmental models, requiring arterial ppIFs MICK (Modelling, Input functions and Compartmental Kinetics) version 5.2 software (available on request from Rainer Hinz, Wolfson Molecular Imaging Centre, University of Manchester, Manchester, UK; see Supplementary material) was used to fit all regional compartmental models with the Nelder-Mead optimisation algorithm (Nelder and Mead, 1965). MICK uses MATLAB R2009bSP1.

2kbv
In this model, three microparameters are derived: K1, rate constant for influx of the ligand from the plasma to the tissue compartment containing free, non-specifically bound, and specifically bound ligand; k2, efflux rate constant from the tissue back to plasma; and bv, the blood volume term. The V T is then calculated according to the compartmental model equation (Innis et al., 2007).

4kbv
In addition to K1 and k2 (described above for the 2kbv model), two further rate constants were estimated: k3, which describes the transfer from the free and non-specifically bound compartment to the specifically bound (second tissue) compartment; and k4, which describes the opposite transfer. Again, the blood volume term was also computed. According to the consensus nomenclature (Innis et al., 2007): We used starting estimates of K1=0.01 ml cm -3 min -1 , k2-k4=0.001 min -1 and bv=0.05.
2.10. Graphical analyses, requiring arterial ppIFs 2.10.1. Regional Logan's graphical analysis with arterial ppIF Logan's graphical analysis (Logan et al., 1990) is a linear analysis applicable to radioligands with reversible binding. After some time (t*), a plot of is linear; where ppIF(t)metabolite-corrected plasma radioactivity concentration at time t, TAC(t)region of interest radioactivity concentration at time t. For the two-tissue compartment model 4kbv, the slope of the plot is: [where bv is the blood volume term], from which V T can be calculated as above. MICK was used to fit regional Logan's graphical analyses with the Nelder-Mead optimisation algorithm (Nelder and Mead, 1965). We used fixed parameters of t*=1680 s (i.e. 28 min, based on a preliminary inspection of the plots), bv=0.028 (the median across 83 brain regions and the 10 scans, as estimated using 4kbv; interquartile range 0.022-0.036), and equal weights (i.e. each frame was weighted by the same value). The contribution due to vasculature (bv) was subtracted from the ROI TAC prior to the graphical analysis.
2.10.2. Voxelwise Logan's graphical analysis with arterial ppIF Parametric images of [ 11 C]Ro15-4513 V T were generated from smoothed (isotropic filter with 2.0 mm FWHM) dynamic images and the ppIFs using voxelwise Logan's graphical analysis with ppIF, as implemented in MICKPM, using the same fixed parameters as listed above.
2.11. Model-free analyses, requiring arterial ppIFs 2.11.1. Regional (non-regularised) SA V T s for each ROI were obtained from the dynamic images and the ppIFs using SA Turkheimer et al., 1994), as implemented in MICK using the non-negative least squares (NNLS) algorithm (Lawson and Hanson, 1995). The analysis used a base with 100 logarithmically-spaced functions. The fast frequency boundary was kept at the default value of 0.1 s -1 . The theoretical slow frequency boundary is based on the decay constant of 11 C (t ½ ≈20 min, decay constant 0005663 s -1 , log 10 =−3.25). Based on previous work with tracers with relatively slow kinetics (Hammers et al., 2007b;Riaño Barros et al., 2014) and preliminary investigations (Barros et al., 2010b), we changed this to 0.00063 s -1 (log 10 =−3.20) in order to reduce noise.

Voxelwise SA
Parametric images of [ 11 C]Ro15-4513 V T were generated from the dynamic images and the ppIFs using voxelwise SA as implemented in MICKPM using the NNLS algorithm, with the same number of logarithmic functions and the same fast and slow frequency boundaries as listed above.
2.12. Methods not requiring arterial ppIFs 2.12.1. Voxelwise standardised uptake values (SUVs) Standardised uptake value images (SUVs) were generated from the decay-corrected summation (add) images in SPM8 for frames 16-21 and for frames 22-24, i.e. from 30.5 to 60.5 and from 60.5 to 90.5 min respectively, according to Kenney et al. (1941): 2.12.2. Regional SRTM using brainstem or alternatively using cerebellum GABA A receptors are widespread in the brain, and a true reference region devoid of α5-subunit specific binding does not exist. Attempts have been made to obviate arterial cannulation by using the brain region with the lowest receptor concentration as a pseudo-reference region . The brainstem and the cerebellum are two of the structures with the lowest concentration of α5 subunits in the human brain (Fritschy and Mohler, 1995;Pirker et al., 2000;Sieghart and Sperk, 2002;Veronese et al., 2016). We therefore used the brainstem and the cerebellum, separately, as a pseudo-reference region in the SRTM (Lammertsma and Hume, 1996) as implemented in MICK, with Nelder-Mead optimisation. We used starting estimates R I =0.95, k2a=0.001 min -1 and k2a′ (k2RefRegion; efflux rate constant from the reference compartment back to plasma) =0.001 min -1 . The model reduces to the following equation (Wu and Carson, 2002), from which the binding potential (BP ND ; (Innis et al., 2007)) can be calculated: [where *convolution operator; C r (t)radioactivity concentration in the reference region tissue; C(t)total radioactivity concentration timecourse in the tissue; k 2athe apparent k 2 , i.e. k 2 /(1+BP); R 1the relative delivery i.e. the ratio K 1 /K 1 ′ where K 1 ′ (ml ml -1 min -1 ) is the rate constant for the influx of the ligand from the plasma to the reference compartment; ttime (min)].
2.12.3. Voxelwise SRTM2 using brainstem or alternatively using cerebellum Parametric images of BP ND were generated from the dynamic images using a two-step procedure, SRTM2 (Wu and Carson, 2002) with 100 basis functions (Gunn et al., 1997) in MICKPM. Consistent with the SA, we used beta min=0.00063 s -1 . We used beta max=0.014 s -1 (similar to Gunn et al., 1997). For each participant, k2RefRegion was set to the global median of k2RefRegion estimates derived from a first-pass SRTM, which itself used the same fixed parameters and a tight brain mask (Wu and Carson, 2002).

Global radioactivity concentration
Global radioactivity concentrations were calculated for each decaycorrected, summed radioactivity image (frames 01-24) with an inhouse script adapted from SPM (Hammers et al., 2007a), where the global radioactivity concentration is defined as the mean voxel value within a mask. The mask itself is defined as all voxels exceeding oneeighth of the mean value of all voxels in the entire image matrix.

Comparison of test and retest injectate data
For the statistical testing we used SPSS for Windows version 22 software (IBM 2013, NY, USA). Injectate data (injected radioactivity, radiochemical purity, co-injected mass of stable ligand, and specific radioactivity at the time of injection) and global radioactivity concentration were compared between test and retest sessions using Student's paired samples t-test (for data with a normal distribution) or the nonparametric Wilcoxon signed-rank test (for data which differed significantly from the normal distribution, i.e. Kolmogorov-Smirnov test p < 0.05).

Model fit and within-ROI variability
The median residual sum of squares (RSS) was calculated for each ROI, where available (regional variants), as a summary measure of the fit of the model to the observed data. Alternatively, the median withinsubject coefficient of variation (WS-CV) was calculated, where available (voxelwise variants), as a summary measure of the within-ROI variability in the binding parameter.

Reproducibility
To assess testretest variation (i.e. reproducibility), the median (signed and alternatively absolute) percentage difference between test and retest studies as well as their range was calculated for each ROI, for each variant. The (signed) percentage test-retest differences of binding parameters obtained was calculated according to:

Reliability
Reliability was calculated using the intraclass correlation coefficient (ICC; (MacLennan, 1993)): [where MSmean sum of squares; BSbetween-subject; WSwithin-subject; and dfdegrees of freedom]. The ICC is provided to allow assessment of the reliability of the measure as a function of both within-subject variability and between-subject variability; the closer the ICC to 1, the more reliable the variant, i.e. the smaller the withinsubject variability of the parameter compared with natural betweensubject variability. ICCs were computed in SPSS using the "one-way random" model. We report the "single measures" ICC.
2.14.5. Regional heterogeneity Finally, the ratio of binding in the highest-binding region (hippocampus) to the lowest-binding non-reference region (occipital lobes) was calculated to allow assessment of each variant's ability to depict the known heterogeneity in α5 subunit availability across the brain. Ratios ("x") of 1.5≤x < 1.8 were described as "moderate" heterogeneity; 1.8≤x < 2.0 were described as "high" heterogeneity; x≥2.0 were described as "very high" heterogeneity.

Reproducibility and reliability of blood and PET data quantification
See Sections 2 and 3 of the Supplementary material for details of the parameters derived from the metabolite and plasma-over-blood ratio models, and the six PET quantification methods (12 variants). Figs. 1 to 5 provide examples of the output.

Comparison between analysis variants
The analyses did not produce any outliers for the ROIs (where "outlier" is defined as V T or BP ND ≤0 and/or WS-CV>50% for regional variants and mean V T or mean BP ND ≤0 and/or WS-CV>100% for voxelwise variants). For well-performing regional variants, there was no evidence of bias or structure in the weighted residuals. Table 2 provides an overview of the MA-TDs (%) for the six different methods (12 variants). The MA-TDs were very low to low for all SRTM variants and voxelwise SA (≤5%); MA-TDs were also low ( < 10%) for most ROIs with the 4kbv model. These variants all had very low test-retest differences in the hippocampus (< 5%). Table 3 provides an overview of the BS-CV (%) for the six different methods (12 variants). The median BS-CVs were moderate for most or all of the ROIs (11-15%) for several variants: 2kbv, 4kbv, voxelwise Logan's graphical analysis, both regional and voxelwise SA, and SUVs (30.5-60.5 and 60.5-90.5 min). The remaining variants, particularly SRTMs using a pseudo-reference region, were characterised by low (≤10%) BS-CVs for nearly all ROIs. Table 4 provides an overview of the ICCs for the six different methods (12 variants). The median ICC was excellent (>0.80) for both the voxelwise SA, and for the voxelwise SRTM2 with the cerebellum as a pseudo-reference region. Regional SRTM using cerebellum also yielded a good (>0.70) median ICC. Other variants yielded low to moderate (≤0.70) median ICCs. Regarding the hippocampus, the 4kbv model, voxelwise SA, and both regional and voxelwise SRTM/SRTM2 using cerebellum, and voxelwise SRTM2 using the brainstem all yielded excellent (>0.85) median ICCs. Regional SRTM with the brainstem yielded a very good median (0.77) ICC. Table 4 also shows the ratio between the hippocampus, which for all variants was the region with highest binding, and a low-binding nonreference region (occipital lobes). Voxelwise SA, regional SRTM using brainstem, and both regional SRTM and voxelwise SRTM2 using the cerebellum all yielded very high ratios (≥2.0).

Discussion
We describe the testretest reproducibility and reliability of quantification of the availability of the GABA A receptor α5 subunit in five healthy human participants. Our major finding is that very good to excellent reproducibility of estimates, in terms of percentage testretest difference, is achievable using regional and voxelwise implementations of the SRTM and also using model-free, voxelwise SA.
Voxelwise SA was the best-performing variant, in terms of ICCs, and one of the best in terms of percentage testretest difference. This variant also yielded a slightly higher median BS-CV (11%) than SRTMbased variants, and had a high ratio of hippocampal-to-occipital lobe V T . We note that voxelwise SA markedly outperformed SA applied to regional TACs. This phenomenon has also been documented for the opioid receptor radioligand [ 11 C]diprenorphine (Hammers et al., 2007b) and the cannabinoid receptor type 1 radioligand, [ 11 C] MePPEP (Riaño . We suggest that the voxelwise approach benefits from the flexibility to be able to accommodate differences in blood volume, tissue class partial volume and receptor concentration between voxels, in contrast to variants that use the averaged regional TAC. The assumptions inherent to SA are that: 1) the compartmental systems are strongly connected; 2) the exchange of material with the environment is confined to a single compartment; and 3) there is no possibility for material to pass from one compartment through two or more compartments back to the initial compartment (Schmidt, 1999). There is no evidence to indicate that SA is biased towards or against any particular patient population. An arterial input function is required, as the fit assumes a sum of positive series of convolution integrals of the input function. One advantage of SA is that it is "data driven", i.e. a priori model selection is not required. Like all voxelbased methods, the generation of parametric V T images via voxelwise SA has the added advantage of allowing whole-brain surveys in diseases where the exact localisation of pathology is not known, e.g. refractory focal epilepsy.
Of the compartmental models, the 2kbv model had a very high median percentage testretest variability (MA-TD; 25%); whilst the 4kbv model had an acceptable MA-TD (8%). However, a wide range of percentage testretest variability was observed across participants for each region, other than in the hippocampus (3%, range −2-7%). These data are in keeping with previous findings, in which the fits with twotissue compartment models were better than those seen with one-tissue compartment models (Asai et al., 2009;Myers et al., 2012;Myers et al., 2012Myers et al., , 2016. While [ 11 C]Ro15-4513 has highest affinity for GABA A receptors containing α5 subunits (Lingford-Hughes et al., 2002), it also binds to GABA A receptors containing α1 and other α subunits, albeit with approximately 10-15 times lower affinity (Hadingham et al., 1993;Luddens et al., 1994;Myers et al., 2012;Stokes et al., 2013). Recently, human heterologous competition data acquired from healthy males using the α5-subunit-selective negative allosteric modulator, Basmisanil (RG1662), suggested that α5-specific binding accounts for 60-70% of the specific binding in most regions (Myers et al., 2016). As the regional distribution of α subunits overlaps, the tissue kinetics, model fits and hence reliability will vary according to the proportion of subunits (Maeda et al., 2003;Myers et al., 2012). Model-free quantification, such as with SA, offers flexibility to deal with the complex compartmentalisation of the radioligand targets Riaño Barros et al., 2014).
Logan's graphical analyses had a very high MA-TD, whether applied to regional TACs or on a voxel-by-voxel basis. It is possible that more than nine frames are required to accurately fit the plot, although we did smooth the dynamic images before voxelwise analyses. Also, the analyses assumed a fixed blood volume contribution of 0.028, the median derived from multiple regions and scans, which cannot be correct for each ROI in each participant.
In the present study, we quantified the total V T , rather than attempting to isolate the presumed α5-subunit-specific volume-ofdistribution (V s ), for example via bandpass SA (Stokes et al., 2014). However, accurate isolation of the V s is challenging and is vulnerable to the effects of tissue heterogeneity and noise. Whilst V s appears to exhibit a tight relationship with the 'true' α5-subunit-specific V T in regions with moderate or high α5 subunit concentration, the total V T also exhibits a tight, linear association (Myers et al., 2016).
As expected, our analyses revealed that the reproducibility of [ 11 C] Ro15-4513 V T was sensitive to multiple methodological choices, e.g. derivation of the input function and method used to calculate the weighting factors (e.g. Yaqub et al., 2006; data not shown).
Non-invasive PET studies are preferable in both research and clinical studies, in order to avoid the discomfort and slight risks attributable to arterial cannulation, a procedure which demands expertise. GABA A receptor α5 subunits are expressed throughout the brain, and a true reference region does not exist. Here we used the brainstem and alternatively the cerebellum as pseudo-reference regions, based on their near-negligible expression of the α5 subunit (Fritschy and Mohler, 1995;Pirker et al., 2000;Sieghart and Sperk, 2002;Veronese et al., 2016). This approach is supported by recent data that suggest the V s is low in the cerebellum and extremely low in the pons (Myers et al., 2016). In the present study, both variants yielded reproducible data, in terms of percentage testretest difference, whether applied to regional TACs or on a voxel-by voxel basis. BS-CV was low, however, which perhaps reflects a bias of the reference region methods. The actual BP ND values were much lower for the SRTMs when using cerebellum as the pseudo-reference rather than the brainstem, which probably reflects greater α5-subunit-specific binding in the former (Myers et al., 2016). A wider range of percentage testretest difference was seen for the brainstem than for the cerebellum with compartmental models. We observed lower signal-to-noise ratio, i.e. noisier time-activity curves, for the smaller brainstem ROI, which impaired model fitting.
Whilst the recent Basmisanil (RG1662) blocking study found a tight, linear relationship between BP ND and the 'true' α5-subunitspecific volume-of-distribution with both variants (Myers et al., 2016), SRTMs should only be used if even very small intra-or between-subject variations in the amount of GABA A receptor-specific binding in these pseudo-reference regions can be excluded. It might be possible to improve on these results by utilising a sub-region of the brainstem or cerebellum, or via a more sophisticated pseudo-reference region approach (e.g. Turkheimer et al., 2012).
SUVs can constitute a simple and reliable measure of radioligand binding that obviates the need for arterial blood sampling (Riaño . In the present study, [ 11 C]Ro15-4513 SUVs were moderately reproducible overall (median MA-TD 11%), but had a wide range in percentage testretest difference for most regions. The SUVs were moderately reliable (median ICC 0.70), which is partly attributable to the large BS-CV (median 15%). Overall these data suggest that factors other than weight and injected dose significantly influence reproducibility. Given the performance of voxelwise SA with arterial input function, and the moderately-high MA-TD we observed in the area under the metabolite model curve (12%, see Supplementary material, Table 2), we hypothesise that such factors include the rate of metabolism of the parent radioligand.
The present study is limited by the sample size; in particular ICCs must be treated with caution when using paired data acquired from eight or less participants (Walter et al., 1998;Shoukri et al., 2004). As the free fraction of parent radioligand in the plasma was not quantified, we cannot comment on the reproducibility or reliability of BP f or VT f (Innis et al., 2007). However, voxelwise SA based on arterial ppIFs that were not corrected for plasma free fraction still yielded reproducible and reliable V T data. The testretest scan interval was short, but varied from a week to two months between participants; this variable was not associated with reproducibility. The lack of females in our study population is an additional limitation. To the best of our knowledge, the potential influence of the menstrual cycle on the availability of the GABA A receptor or of the α5 subunit in particular has not been studied. However, the menstrual cycle influences GABA concentration in the frontal lobe, as measured by proton magnetic resonance spectroscopy (e.g. Harada et al., 2011;De Bondt et al., 2015), which could conceivably lower testretest reproducibility. Hence, confirmation of our findings in women would be desirable.

Conclusions
Quantification of [ 11 C]Ro15-4513 binding shows very good to excellent reproducibility with SRTMs and voxelwise SA. Quantification of binding in the α5-subunit-rich hippocampus is particularly reliable. Whilst SA necessitates arterial blood sampling, it is preferable to the SRTMs due to the lack of a true reference region.
[ 11 C]Ro15-4513 PET is well-placed as a tool to study the availability of the GABA A receptor α5 subunit in health and neuropsychiatric disease.

Conflicts of interest
The authors do not report any conflicts of interest.

Table 3
Mean between-subject coefficients of variation (BS-CV; %) for participants' parameter estimates (BP ND /SUV/V T ) obtained with the six different methods (12 variants).