Accuracy and reliability of [11C]PBR28 specific binding estimated without the use of a reference region

[11C]PBR28 is a positron emission tomography radioligand used to estimate the expression of 18kDa translocator protein (TSPO). TSPO is expressed on glial cells and can function as a marker for immune activation. Since TSPO is expressed throughout the brain, no true reference region exists. For this reason, an arterial input function is required for accurate quantification of [11C]PBR28 binding and the most common outcome measure is the total distribution volume (VT). Notably, VT reflects both specific binding and non-displaceable binding (VND). Therefore, estimates of specific binding, such as binding potentials (e.g., BPND) and specific distribution volume (VS) should theoretically be more sensitive to underlying differences in TSPO expression. It is unknown, however, if unbiased and accurate estimates of these measures are obtainable for [11C]PBR28. The Simultaneous Estimation (SIME) method uses time-activity-curves from multiple brain regions with the aim to obtain a brain-wide estimate of VND, which can subsequently be used to improve the estimation of BPND and VS. In this study we evaluated the accuracy of SIME-derived VND, and the reliability of resulting estimates of specific binding for [11C]PBR28, using a combination of simulation experiments and in vivo studies in healthy humans. The simulation experiments showed that VND values estimated using SIME were both precise and accurate. Data from a pharmacological competition challenge showed that SIME provided VND values that were on average 19% lower than those obtained using the Lassen plot, but similar to values obtained using the Likelihood-Estimation of Occupancy technique. Test-retest data showed that SIME-derived VS values exhibited good reliability and precision, while larger variability was observed in SIME-derived BPND values. The results support the use of SIME for quantifying specific binding of [11C]PB28, and suggest that VS can be used in preference to, or as a complement to the conventional outcome measure VT. Additional studies in patient cohorts are warranted.


Introduction
The brain immune system has long been hypothesized to play an important role in the development and progression of neurological and psychiatric conditions (1,2). To date, the most common method for measuring immune activation in vivo is to use positron emission tomography (PET) to quantify the expression of the 18kDa translocator protein (TSPO) in the brain (3). TSPO is located in glial cells, including microglia and astrocytes, and has been considered a marker for activation of these cell types (4).
[ 11 C]PBR28 is a second-generation TSPO radioligand with improved signal to noise ratio (5) and reliability (6) relative to the first generation radioligand (7). It is arguably the most widely applied second-generation radioligand for examining TSPO levels in psychiatric and neurological disorders (8)(9)(10). An important goal in the field has been the evaluation of [ 11 C]PBR28 as a diagnostic marker for monitoring treatment strategies that target the immune system of the brain.
For this purpose, it is necessary to develop methods that provide reliable, accurate and precise estimates of outcome measures reflecting [ 11 C]PBR28 specific binding to TSPO.
Since there is no region devoid of TSPO in the brain, quantifying [ 11 C]PBR28 binding requires measurements of metabolite-corrected radioligand concentrations in the arterial plasma to be used as an arterial input function (AIF) in a kinetic model. When using an AIF, the most straightforward estimate of binding in the brain is the total distribution volume (V T ), which represents the sum of the radioligand specific (V S ) and non-displaceable (V ND ) distribution volumes. As such, V T can only be considered an indirect index of specific binding to TSPO. In contrast, V S or the non-displaceable binding potential (BP ND =V S /V ND ) are more direct estimates of specific binding (11) and should theoretically possess higher sensitivity to detect longitudinal changes or group differences. However, V S and BP ND calculated directly from the rate constants (estimated using a kinetic model with an AIF) are often unstable and unreliable (12,13), especially for TSPO radioligands (6,7), and therefore of limited utility in practice.
The kinetic modelling technique Simultaneous Estimation (SIME) aims to derive a reliable, brain-wide estimate of V ND in absence of a reference region (14) and consequently, more stable estimates of specific binding can be obtained. In brief, the method works by identifying the value for V ND that best describes the observed PET data across all brain regions considered in the analysis. So far, SIME has been evaluated for the serotonin receptor 1A radioligands [ 11 C]WAY-100635 and [ 11 C]CUM101. The results showed that, for these radioligands, SIME obtained estimates that are close to "gold standard" measures of V ND for these radioligands (14). With regards to [ 11 C]PBR28, SIME was recently applied to quantify [ 11 C]PBR28 BP ND in a cohort of healthy controls and patients with Alzheimer's disease (15). That study concluded that SIME appeared to be useful for quantification of [ 11 C]PBR28, because V ND and BP ND were considered clearly identifiable and fell within ranges that were expected based on theory and previous publications. However, it still remains unclear whether [ 11 C]PBR28 V S or BP ND derived using SIME is unbiased and reliable, as the method has not yet been evaluated in cases for which the true TSPO binding levels were known.
The aim of this study was to evaluate the accuracy and reliability of SIME for estimating [ 11 C]PBR28 V ND and specific binding. To examine accuracy, we a) performed a simulation experiment, b) compared SIME-V ND to V ND estimates obtained from pharmacological competition challenge, and c) compared SIME-V ND , V S and BP ND values between high affinity binder (HAB) and mixed affinity binder (MAB) subjects in a large group of healthy controls. To examine reliability, test-retest properties of SIME-derived BP ND and V S values were assessed using a [ 11 C]PBR28 test-retest data set.
Methods SIME and measures of specific binding SIME constrains V ND (i.e., K1/k2 in a 2TCM) to be the same across a set of regions of interest (ROIs). A grid of possible V ND values is then evaluated as follows: For each possible V ND , all ROIs are simultaneously fitted using a constrained 2TCM (in which K1/k2 is forced to be equal to the V ND under evaluation). The corresponding residual sums of squares (RSS) across time frames and ROIs are then used to build an objective function for the purposes of determining V ND . The coordinate at which the objective function achieves a minimum is considered the optimal estimate of V ND for that PET measurement. For a more detailed explanation of the SIME algorithm see (14).
In this study, the SIME-derived estimates of V ND were subsequently used to calculate outcome measures of [ 11 C]PBR28 specific binding according to were V T was independently derived from an unconstrained 2TCM in a target ROI. The primary target ROI used in this study, unless otherwise specified, is the whole grey matter defined using FreeSurfer (v5.0.0, http://surfer.nmr.mgh.harvard.edu/) segmentation.

Subjects and data
This study includes three different datasets of healthy subjects that underwent PET examinations with [ 11 C]PBR28 (Table 1). All subjects gave written informed consent prior to their participation.
Their eligibility was confirmed via a health screening, evaluation of their medical history, physical and neurological examinations and routine blood tests.

KI [ 11 C]PBR28 database
The Karolinska Institutet (KI) [ 11 C]PBR28 database currently consists of 54 subjects (30 HABs and 24 MABs; 32 males and 22 females) who participated as healthy controls in a set of previously published (6,9,16) or ongoing [ 11 C]PBR28 studies. All subjects were examined on the same PET system using identical protocols for radioligand synthesis, acquisition of transmission and emission data, and image reconstruction and analysis, as described below.
PET measurements were carried out at the PET center at KI, Stockholm, on a High-Resolution (N=19) or 9x360s (N=35). PET images were then reconstructed using ordered subsets expectation maximization, including modelling of the point spread function.
Arterial blood samples were acquired during the first 5 minutes of each PET examination using an automated blood sampling system (ABSS, Alogg technogies, Mariefred, Sweden). In addition, manual samples (1-3 mL) were drawn between 1 and 20 minutes post injection, in 2-minute intervals. Afterwards, manual samples were acquired in 10-minute intervals until the end of the examination. Radioactivity was immediately measured in a well counter that was cross-calibrated with the PET system. Corresponding plasma samples were obtained by centrifuging the blood samples and measuring radioactivity in the ensuing plasma using the same well counter.
Whole-blood time activity curves (TACs) were obtained by combining the ABSS and manual blood samples curves. The plasma radioactivity curve was generated by multiplying the wholeblood TAC with plasma-to-blood ratios estimated from manual plasma samples. Parent fraction of the radioligand was measured as described previously (6). To estimate the parent fraction at intermediate time points, a Hill function was fitted to the measurements and multiplied with the plasma curve to produce the final metabolite-corrected plasma curve used as AIF for each examination.
T1-weighted Magnetic Resonance Imaging (MRI) images were obtained for all subjects on a 3-T General Electric Discovery MR750 system (GE, Milwaukee, WI). ROI delineation was performed using the FreeSurfer software resulting in 12 ROIs: whole grey matter (GM), frontal cortex, temporal cortex, parietal cortex, occipital cortex, limbic lobe, thalamus, striatum, insula, anterior cingulate cortex, posterior cingulate cortex and cerebellum. All ROIs were co-registered to the corresponding PET image, allowing for extraction of regional TACs. Since a subset of subjects in the database underwent only 75 minutes of PET examination, all TACs in this study were truncated at 75 minutes to allow for consistent pooling and comparisons, unless otherwise specified.

Pharmacological competition data
Data from five healthy control subjects (all HABs, all males) who participated in a previous pharmacological competition study (18) carried out at IMANOVA Ltd London, were reanalysed to examine the correspondence between SIME-V ND and V ND estimates obtained from a XBD173 5 blocking challenge using Lassen plot. After a baseline PET, subjects received an oral dose of the selective TSPO agonist XBD173 (10 to 90mg), followed two hours later by a repeated [ 11 C]PBR28 examination. Radiochemistry, imaging protocols, reconstruction, retrieval of TACs and AIF are described in the original study (18). For the present reanalysis, 9 ROI TACs were obtained from both the baseline and blocking measurement: frontal cortex, occipital cortex, temporal cortex, parietal cortex, hippocampus, amygdala, thalamus, striatum and cerebellum.

Test-Retest data
A subset of subjects (N=12) in the KI [ 11 C]PBR28 database participated in a test-retest study of (6). For six of them, two PET measurements were carried out on the same day, and for the other six, the PET scans were taken 2-5 days apart. One PET examination performed on a HAB subject was shortened (60 min) due to technical reasons, and this participant was therefore excluded from the test-retest analysis in this study. Image analysis and kinetic modelling for all remaining 11 test-retest subjects were carried out as described in section 1 above. This process was then repeated for all frames, resulting in one simulated noise instance. In our simulation study, 1000 noise instances were created in this way. These simulations can be conceptually interpreted as an approximation of a situation in which a single subject has been scanned 1000 times. For an in-depth explanation of the simulation procedure see supplementary material in (19). Finally, SIME was applied to each simulated noise-instance and estimates of V ND were obtained and compared against the "true" V ND for the underlying subject. The same procedure as described above was then applied to a MAB subject randomly selected from the KI [ 11 C]PBR28 database.
In order to assess the robustness of SIME when applied to [ 11 C]PBR28 we scaled the noise up by 50% in the simulated data by multiplying each residual by 1.5.

XBD173 competition challenge
V T values for each ROI (listed above in section "1. KI [ 11 C]PBR28 database") and for each subject were obtained using the unconstrained 2TCM for all baseline and blocking examinations. The revised Lassen plot (20) was applied to estimate V ND for each subject separately. In addition, the Lassen plot, it has also been suggested that occupancy and V ND can be estimated from a blocking data using multi-level modelling with likelihood-based techniques (21,22). Here, we employed the Likelihood Estimation of Occupancy (LEO) (21) method to compliment to the Lassen plot. LEO has shown to produce highly accurate estimates of V ND for another radioligand, but a pre-requisite of the model is that the ROI variance-covariance matrix is known. This matrix can be estimated from an independent test-retest dataset from the same radioligand. In addition to the Lassen plot, we therefore also applied LEO to the blocking data to estimate V ND , using the test-retest [ 11 C]PBR28 examinations described above. Finally, SIME was applied to all baseline measurements, and SIME-derived V ND values were compared to the outcomes from the Lassen plots and LEO. Both 70 and 90 minute TACs were used for SIME, Lassen plot and LEO, in order to examine the stability of V ND over time.

Differences between HABs and MABs
One aim of the study was to examine differences in SIME-V ND , and ensuing estimates of specific binding, between HAB and MAB subjects. For this goal, SIME was applied to all subjects' ROI HABs and MABs using V T and SIME-derived outcomes (i.e. V S and BP ND ) was then assessed by calculating Hedges' g effect sizes of group differences, as well as percentage differences.
In the preliminary analysis, we found an unexpected group difference in SIME-V ND between

Test-Retest analysis
For the subjects in the test-retest study (6), V T values from GM and SIME-derived outcomes (V ND , V S and BP ND ) were obtained as described above for both test and retest PET examinations. The intraclass correlation coefficient (ICC) was used as a measure of test-retest reliability; percentage average absolute variability or test-retest variability (AbsVar) was used as a measure of reproducibility; and the standard error of measurement (SEM; (23)) was used as a measure of precision.
AbsVar was included for reference since it is the most common metric reported in PET test-retest studies. However, it should be noted that AbsVar scales with the additive magnitude of the outcome and is therefore not suitable for comparing absolute test-retest performance between different outcome measures.
All kinetic modelling in this study was performed in Matlab 2014 (Mathworks, Natick, MA) and all statistical analyses were performed in R (v3.3.2, "Sincere Pumpkin Patch").

Results
Simulations Figure 1 shows the results from the simulation experiment. V ND estimated using SIME showed high precision and little bias for both genotypes (Panel A V ND:True = 1.15, mean V ND:SIME = 1.17 ± 0.035SD; Panel B V ND:True = 0.62, mean V ND:SIME = 0.63 ± 0.018SD). When amplifying the noise by 50%, SIME still provided high accuracy when estimating the "true" V ND , although with somewhat lower precision (Panel C V ND:True = 1.15, mean V ND:SIME = 1.17 ± 0.082SD; Panel D V ND:True = 0.62, mean V ND:SIME = 0.63 ± 0.027SD). . On average, V ND estimated with SIME was lower than that obtained using Lassen plot, but similar to that obtained with LEO. All three methods showed lower V ND when shorter time activity curves (TACs) were used. The shaded area around the lines represent 1SE. There was no overlap between the two lines in the interval seen in the subplotceptions, as determined by 95% confidence intervals.    Table 2 displays the test-retest metrics for all outcome measures, using the GM as ROI. V S from SIME showed excellent reliability (ICC > 0.9), while BP ND showed poor reliability (ICC < 0.75) (24). V S from SIME and V T from the unconstrained 2TCM showed similar reliability and precision to each other (see Table 2).  Figure 4: Separation of genotype groups using different outcomes. Mean percentage differences suggest that both V T from 2TCM (A) and V S estimated using V ND from SIME (B) showed a strong separation between genotype groups, with HABs having double the V S compared to MABs. SIME-BP ND showed lower mean percentage separation between HABs and MABs (C) compared to V T and V S . When controlling for the difference in V ND between genotype groups by using an average input function for all subjects, similar results were obtained (D). Scatter plots and Pearson's correlation coefficients (r) between V T from 2TCM and V S , BP ND and V ND from SIME with the whole grey matter as region of interest.

Discussion
Accurate, reliable and precise quantification of [ 11 C]PBR28 specific binding is of high interest for clinical research, as it would theoretically lead to easier detection of effects, allowing for higher power or lower sample sizes to be used, and thereby reducing the costs in PET TSPO studies. The purpose of this study was to evaluate a new method for deriving estimates that reflect [ 11 C]PBR28 specific binding (14), which has shown promising potential for [ 11 C]PBR28 group comparisons (15).
We simulated [ 11 C]PBR28 TACs and examined the ability of SIME to estimate a known underlying V ND value. The results showed that, in simulations, SIME-derived V ND values were both accurate and precise (Figure 1 A and B). This was also the case when the amount of noise in the TACs was increased above realistic levels, suggesting that SIME is robust to high levels of noise ( Figure 1 C and D).
We also compared SIME against "gold standard" measures of V ND by using data from a XBD173 blocking challenge (18). SIME, applied to the baseline scans, yielded V ND values in the same range as the recently developed LEO technique (21), but lower than V ND values obtained with the revised Lassen plot ( Figure 2). Notably, for another radioligand ([ 11 C]DASB) it has previously been reported that the Lassen plot tends to overestimate of V ND , in particular at low occupancy levels, whereas V ND estimates obtained from LEO showed to be in large unbiased, given a sufficiently powered test-retest dataset (21). It is therefore likely that the higher V ND values seen with Lassen plot in the present study reflects, at least in part, inaccuracies in the Lassen plot rather than in SIME. Both the Lassen plot and LEO showed lower V ND when shorter TACs were analyzed, suggesting that estimates of V ND are sensitive to scan duration. This trend was also reflected by the SIME method, which showed similar percentage decrease in V ND .
In the 2TCM, V ND only reflects non-specific binding and free radioligand in tissue, which together constitute the non-displaceable binding. Since it is generally believed that the genotype only affects the radioligand's affinity to TSPO, it follows that no difference in V ND estimates between genotype groups is expected. However, in this study SIME-derived V ND estimates showed a clear difference between HAB and MAB subjects ( Figure 3A). We have identified three potential explanations for this observation: 1) the SIME approach is sensitive to "spill in" from the specific compartment to the non-displaceable compartment, so that SIME-derived V ND values are inflated by high V S values; 2) there is a systematic error in the measurement of the AIF for HABs and/or MABs that affects the estimated V ND ; 3) a subject's V ND is dependent on the TSPO genotype (such as an TSPO affinity-dependent transport across the blood-brain barrier). To assess the first possibility, we performed additional simulations (not shown) in which the k3/k4 ratio (i.e., the BP ND ) was both substantially increased and decreased, respectively, while the true V ND was kept constant. These additional simulations showed that SIME produced similar estimates of V ND re-gardless of the k3/k4 values, suggesting that hypothesis 1 above is an unlikely explanation to the observed difference in V ND between genotype groups. As for the second possibility, we observed a clear difference between genotype groups in both AUC and shape of the plasma TAC ( Figure 3C and 3D). When using a normalized input function for all subjects, the differences in SIME-V ND between HABs and MABs disappeared ( Figure 3B), an observation consistent with 2), but also with explanation 3) above. However, conclusions about underlying biology should not be drawn solely based on the performance of models. To date, there exists no published [ 11 C]PBR28 blocking data examining V ND in MAB subjects. Hence, the observed difference between genotypes cannot be fully verified, and this phenomenon warrants further investigation.
In this study, we compared SIME-derived binding values between TSPO genotype groups ( Figure 4). When using individual AIFs, SIME V S in HABs (mean = 2.69) was almost exactly double the value of V S in MAB subjects (mean = 1.36). Assuming SIME-V S is valid, this is to be expected since the low-affinity-binder allele shows negligible binding of [ 11 C]PBR28 to TSPO, so that HAB subjects effectively have twice as many TSPO binding sites as MAB subjects (25).
The reliability of [ 11 C]PBR28 V S and BP ND in GM was evaluated using a test-retest data set. SIME-derived V S showed high reliability and precision, reaching the threshold recommended for clinical use (ICC > 0.9) (24). SIME-derived BP ND showed both less separation between genotype groups and lower reliability, compared to both V T and SIME-derived V S ( Figure 4C). One potential explanation for these findings is that small amounts of measurement error in both the numerator (V S ) and the denominator (V ND ) of SIME-derived BP ND (eq 2) leads to an amplified and larger error in the quotient, while this is not the case for subtraction carried out to calculate V S (eq 1).
The results of this study supports the use of SIME-derived V S as an outcome measure for future [ 11 C]PBR28 examinations in preference to, or to complement, V T from the unconstrained 2TCM. This is in line with the principle that V S reflects more directly the level of specific binding than V T , and a difference of interest between subjects or groups is expected to be confined to only V S . For instance, if V ND represents 30% of the signal, a 25% increase in V S would be reflected by a 17.5% increase in V T , assuming both outcomes show equal variance. These hypothesized differences in sensitivity should be further tested in clinical studies using [ 11 C]PBR28. To facilitate this, we publicly share all code for executing SIME in Matlab (github.com/martinschain/SIME). SIME is also implemented in the open-source R-package kinfitr for kinetic modeling of brain PET data (github.com/mathesong/kinfitr).