Parametric imaging of dual-time window [18F]flutemetamol and [18F]florbetaben studies

Optimal pharmacokinetic models for quantifying amyloid beta (Aβ) burden using both [18F]flutemetamol and [18F]florbetaben scans have previously been identified at a region of interest (ROI) level. The purpose of this study was to determine optimal quantitative methods for parametric analyses of [18F]flutemetamol and [18F]florbetaben scans. Forty-six participants were scanned on a PET/MR scanner using a dual-time-window protocol and either [18F]flutemetamol (N=24) or [18F]florbetaben (N=22). The following parametric approaches were used to derive DVR estimates: reference Logan (RLogan), receptor parametric mapping (RPM), two-step simplified reference tissue model (SRTM2) and multilinear reference tissue models (MRTM0, MRTM1, MRTM2), all with cerebellar grey matter as reference tissue. In addition, a standardized uptake value ratio (SUVR) was calculated for the 90-110 min post injection interval. All parametric images were assessed visually. Regional outcome measures were compared with those from a validated ROI method, i.e. DVR derived using RLogan. Visually, RPM, and SRTM2 performed best across tracers and, in addition to SUVR, provided highest AUC values for differentiating between Aβ-positive vs Aβ-negative scans ([18F]flutemetamol: range AUC=0.96-0.97 [18F]florbetaben: range AUC=0.83-0.85). Outcome parameters of most methods were highly correlated with the reference method (R2≥0.87), while lowest correlation were observed for MRTM2 (R2=0.71-0.80). Furthermore, bias was low (≤ 5%) and independent of underlying amyloid burden for MRTM0 and MRTM1. The optimal parametric method differed per evaluated aspect; however, the best compromise across aspects was found for MRTM0 followed by SRTM2, for both tracers. SRTM2 is the preferred method for parametric imaging because, in addition to its good performance, it has the advantage of providing a measure of relative perfusion (R1), which is useful for measuring disease progression.


Introduction
Early detection of amyloid-beta (A ) plaques has become increasingly relevant, in particular with respect to identifying individuals for secondary prevention trials and monitoring disease progression in Alzheimer's disease (AD) ( Farrar et al., 2017. These A plaques can be identified in vivo with positron emission tomography (PET) scans, using either a dichotomous classification of A burden ( Mallik et al., 2017 ) or a more fine-grained quantitative measure of A , which is required for measuring the extent of pathology or quantitatively tracking disease progression ( Lammertsma, 2017 ).
For the latter quantitative applications, it has been suggested that dynamic or dual-time window scanning protocols should be used for obtaining the most accurate A estimates ( Berckel BNM van et al., 2013 ;Bullich et al., 2018 ;Heeman et al., 2019 ) and, since amyloid pathology might not follow anatomical boundaries, voxel-wise quantitative analyses might be preferred as they provide more detailed spatial information, independent of predefined regions. In addition, quantitative parametric images allow for voxel-by-voxel analysis between groups or as a function of time within the same subjects. Thus, comprehensive evaluation of parametric methods used for voxel-wise analysis is warranted. However, as different tracers have different kinetics, the optimal parametric method for quantification needs to be established for each A tracer separately. In addition, the ability to obtain a voxelwise measure of the relative tracer influx rate ( R 1 ) could play a role when deciding which method is most suitable for parametric analysis. This additional parameter can be used as a proxy for (relative) cerebral blood flow ( Heeman et al., 2021 ) and may therefore be considered as a measure of disease severity or progression, as first shown by single photon emission computed tomography (SPECT) and [ 15 O]H 2 O-PET studies, and more recently replicated with arterial spin labelling (ASL) ( Collij et al., 2016 ;Binnewijzend et al., 2016 ;Jagust et al., 1997 ).
For most amyloid tracers, voxel-wise quantitative approaches have been evaluated to a certain extent ( Bullich et al., 2018 ;Yaqub et al., 2008 ;Verfaillie et al., 2021 ;Heurling et al., 2015 ). However, [ 18 F]flutemetamol and [ 18 F]florbetaben studies were performed with full dynamic acquisitions that did not include the recommended 90-110 min post injection (p.i.) scanning window used for visual assessment and to derive standard update value ratios (SUVR) ( Heurling et al., 2015 ;Becker et al., 2013 ). Further, only Heurling and colleagues evaluated parametric methods for full dynamic [ 18 F]flutemetamol studies by comparing results with those obtained using the gold standard, i.e. full kinetic analysis using an arterial plasma input function ( Heurling et al., 2015 ). They showed that best correlations with the gold standard were obtained using reference Logan (RLogan), the 70-90 min standardized uptake value ratio (SUVR 70-90 ) and receptor parametric mapping (RPM) ( R 2 > 0.94). With respect to full dynamic [ 18 F]florbetaben studies, the performance of two parametric methods has been evaluated only visually, while regional reference tissue models have been quantitatively compared with the gold standard ( Becker et al., 2013 ). It remains unclear whether the results from full dynamic acquisitions remain valid for a dual-time window protocol, given that the performance of parametric methods could be compromised by the gap in the data. Previous work has shown that in some cases, a gap in the data may lead to sub-optimal curve fitting, which in turn may result in a small bias in the results ( Heeman et al., 2019 ). This impact is mediated by the effect of noise and may potentially affect parametric methods in different ways due to differences in their sensitivity to noise. In addition, small interpolation or coregistration errors could also affect performance of parametric methods. These shortened acquisition protocols are especially relevant for fluorine-18 tracers such as [ 18 F]flutemetamol and [ 18 F]florbetaben, considering that routine acquisition of lengthy dynamic scans (110 min duration) is not feasible, especially for older participants ( Bullich et al., 2018, Heeman et al., 2019. In addition to increased patient comfort, these protocols can increase patient throughput and result in more costefficient use of tracer batch productions compared with full dynamic acquisitions. While still less efficient than standard static acquisitions, a dual-time window protocol can provide an additional parameter ( R 1 ), which can be advantageous for studies focusing on disease progression ( Tiepolt et al., 2016 ;Son et al., 2020-, ( Daerr et al., 2016. These characteristics seem to have increased the popularity of dual-time window acquisitions, as can be seen by the adoption of these protocols by largescale studies such as AMYPAD ( Farrar et al., 2017 ).
The present study aimed to evaluate the performance of the most widely used parametric methods for the specific case of dual-time window scans using [ 18 F]flutemetamol and [ 18 F]florbetaben. Performance of these parametric methods was evaluated with respect to three different aspects: 1) visual assessment of parametric images 2) ability to differentiate between A -positive and A -negative scans, and 3) quantitative accuracy and precision with respect to a reference method.

Participants
Forty-six cognitively unimpaired subjects were selected from the AMYPAD PNHS . In short, all subjects underwent neurological and neuropsychological assessment and had at least one PET and one MR scan available. All PET scans were visually assessed by a trained nuclear medicine physician (BvB) following the formal guidelines for reading [ 18 F]flutemetamol and [ 18 F]florbetaben scans, as defined by the respective manufacturers. Approximately half of the participants ( N = 24) was scanned using [ 18 F]flutemetamol and the other half ( N = 22) using [ 18 F]florbetaben. For both tracers, 50% of the participants was A -positive. All participants provided written informed consent before participating in the study in accordance with the Declaration of Helsinki. The Medical Ethics Review Committee of Amsterdam UMC, location VUmc, approved the study protocol (EudraCT Number: 2018-002,277-22).

Image acquisition
All subjects underwent a dynamic PET scan on a Philips Ingenuity TF PET/MR scanner (Philips Medical Systems, Best, The Netherlands), according to a dual-time window protocol ( Heeman et al., 2019 ). This scanning protocol consisted of an initial dynamic scan from 0 to 30 min p.i. followed by a break of 60 min, and then a second scan from 90 to 110 min p.i. Prior to each of the scans, a T1-weighted gradient echo pulse MR sequence was acquired for attenuation correction purposes. At the start of the early PET scan, participants received a bolus injection of either [ 18 F]flutemetamol ( N = 24, 184 ± 11 MBq) or [ 18 F]florbetaben ( N = 22, 280 ± 18 MBq), following manufacturer dosage guidelines. Next, scans were reconstructed into 22 frames (6 × 5, 3 × 10, 4 × 60, 2 × 150, 2 × 300, and 1 × 600 s for the first part of the scan, and 4 × 300 s for the second part) with a matrix size of 128 × 128 × 90 and a voxel size of 2 × 2 × 2 mm with a standard line-of-response-based row-action maximum-likelihood algorithm (LOR-RAMBLA) ( Hu et al., 2007 ) for the brain. All usual corrections, e.g. for MR-based attenuation (MRAC) ( Hu et al., 2010 ), decay, scatter, randoms and dead time were performed. In addition, structural T1-weighted MR images were acquired within, on average, 4.6 ± 3.0 months from the PET scan for [ 18 F]flutemetamol and 4.0 ± 2.5 months for [ 18 F]florbetaben (maximum difference between PET and MR scan was one year), respectively, also on the same Philips Ingenuity TF PET/MR scanner.

Image processing
First, all PET scans were visually checked for between-frame movement. As none showed motion, all were included for further analysis. Next, both structural T1-weighted MR images and early PET images (0-30 min p.i.) were co-registered to their corresponding late PET images (90-110 min p.i.) and visually checked. This was carried out in two steps: first the T1-weighted MR and early MRAC image were coregistered with the late MRAC image and then, the transformation matrix corresponding to the early MRAC was applied to each frame of the early PET image sequence. Following co-registration, the early and late parts of the dual-time window PET scan were combined into a single file using an in-house developed, MATLAB-based software tool. Subsequently, a PVE-lab-based implementation of SPM8 ( Rask et al., 2004 ) was used to segment the MR image into grey matter, white matter and cerebrospinal fluid (CSF). The reference tissue (cerebellar grey matter) region of interest (ROI) was delineated based on the Hammers atlas ( Hammers et al., 2003 ) and time-activity curves (TACs) were extracted.

Parametric analysis
Given that within the AMYPAD PNHS study no arterial input data were acquired, a regional reference tissue method was used as standard to evaluate the performance of all parametric methods. RLogan was chosen as the "regional standard " based on previous studies in which good results in terms of correlation and bias compared with the gold standard (arterial input) had been observed for this method, and stability against variation in confounding factors had also been demonstrated ( Heeman et al., 2021 ;Nelissen et al., 2009 ).
First, missing data points from the reference tissue TACs were interpolated using the reversible two tissue compartment model (4 rate Age 71.7 ± 6.0 69.9 ± 5.1 73.5 ± 6.5 69. Values depicted as mean ± SD. * p < 0.05. * * p < 0.10, compared with the A -positive group. constants) with additional blood volume fraction parameter (2T4k_V b ) and a typical, tracer-specific plasma input function, based on the interpolation procedure described previously ( Heeman et al., 2019 ). Next, the PPET software tool ( Boellaard et al., 2006 ) was used to compute either distribution volume ratio (DVR), non-displaceable binding potential ( BP ND ) or standardized uptake value ratio (SUVR) parametric images based on the following methods: RLogan, receptor parametric mapping (RPM), the voxel-based implementation of the two step simplified reference tissue model (SRTM2), multilinear reference tissue models (MRTM0, MRTM1, MRTM2), and SUVR (calculated for the 90-110 min p.i. interval), all with cerebellar grey matter as reference tissue ( Logan et al., 1996 ;Gunn et al., 1997 ;Lammertsma and Hume, 1996 ;Wu and Carson, 2002 ;Ichise et al., 2003 ). In addition, parametric relative delivery ( R 1 ) images were generated using RPM and SRTM2. For SRTM2, k 2 ' was determined across all voxels with a BP ND higher than 0.05 from a first RPM run, while this parameter was omitted in PPET's implementation of RLogan. In line with previous studies, the RLogan linearization start time ( t * ) was set to 50 min p.i. for both tracers ( Heeman et al., 2021 ;Heurling et al., 2015 ;Logan et al., 1996 ). For both tracers, RPM's basis function (BF) settings were first optimized. The optimization procedure consisted of evaluating a range of BF settings (see supplementary materials, Table S1) and defining the settings that resulted in the highest correlation and the lowest bias (as assessed by R 2 and slope of the regression line) between RPM and regional standard derived DVR. To allow for comparability across methods, BP ND + 1 values (here referred to as DVR) were computed for methods with BP ND as outcome parameter. Finally, the following cortical grey matter volumes of interest (VOIs) were superimposed on the parametric images in order to extract regional values for comparison with those estimated using the regional standard: anterior and posterior cingulate gyrus, middle and orbitofrontal gyrus, inferior and superior frontal gyrus, gyrus rectus, pre-and post-central gyrus, superior parietal gyrus, (infero)lateral remainder of the parietal lobe, insula, cuneus, lateral remainder of the occipital lobe, medial and lateral anterior temporal lobe, posterior temporal lobe, superior, middle and inferior temporal gyrus, lingual gyrus, parahippocampal and ambient gyrus, and the fusiform gyrus. Finally, a volume-weighted global cortical average was calculated from all regional data.

Statistical analysis
All statistical analyses were performed in IBM SPSS Statistics for Windows Version 26.0 (IBM Corp. Armonk New York U.S.A) or Graph-Pad Prism for Windows Version 7.04, (La Jolla California U.S.A.). These analyses can be divided into three groups: 1) assessment of population equivalence between [ 18 F]flutemetamol and [ 18 F]florbetaben scans, 2) visual assessment of parametric images 3) quantitative assessment of the performance of parametric methods compared with the regional standard.

Population equivalence
First, a between tracer comparison of potential differences in age and mini-mental state examination (MMSE) scores of the participants was performed using nonparametric Mann-Whitney U tests, while potential differences in the proportion of males and females were compared using a chi-square test. Subsequently, a within tracer sub-group comparison of age and MMSE scores between A -positive and A -negative subjects was performed using the same tests as described above.

Visual assessment of parametric images
Image artifacts were identified first visually and subsequently by calculating the percentage of (extreme) outliers within total grey matter using a threshold that was defined based on expected clinical values: 2.00 for BP ND and 3.00 for DVR and SUVR images ( Heurling et al., 2015, Becker et al., 2013. Values larger than these thresholds were considered to be outliers. In addition, a more lenient and more stringent threshold were applied to demonstrate the robustness of the analyses, i.e. 1.80, 2.20 for BP ND images and 2.80, 3.20 for DVR and SUVR images.

Quantitative performance of parametric methods
Performance of each parametric method was first determined by assessing possible differences in global cortical DVR, SUVR and R 1 derived from A -positive and A -negative scans. This was assessed using oneway analysis of variance (ANOVA) and the area under the curve (AUC) from a Receiver Operating Characteristic (ROC) curve which is indicative of how well a parameter is able to differentiate between A -positive and A -negative scans.
Next, linear regression analyses, corrected for age, sex and visual status, were used to assess the correlation ( R 2 ) between binding estimates obtained using each parametric method and the regional standard. In addition, Bland-Altman analyses were used to assess bias compared to the regional standard, the variability of the data based on the 95% Limits of Agreement, and whether proportional bias was present. Any proportional bias was further determined by fitting a regression line through the Bland-Altman plot.

Population equivalence
There were no significant between-tracer differences with respect to the participant's age, proportion of males and females or average MMSE scores ( Table 1 ). Within each tracer group, MMSE scores were higher for A -negative compared with A -positive participants ([ 18 F]flutemetamol: p = 0.014, [ 18 F]florbetaben: p = 0.061), while there were no differences in age or proportion males and females ( Table 1 ).

Visual assessment of parametric images
Representative parametric BP ND , DVR, SUVR and R 1 images of both A -positive and A -negative scans are presented in Fig. 1 for each tracer. Visual inspection of parametric BP ND , DVR and SUVR images showed that, for both tracers, most image artifacts caused by outliers (i.e. speckles) were present for MRTM2 and MRTM1 ( Fig. 1 ). Visually, no artifacts were observed in parametric R 1 images, and differences in R 1 derived from A -positive and A -negative scans were small. Furthermore, the   Table S2b). For both tracers, results showed a very similar pattern using the alternative thresholds (supplementary Tables S2a and S2b).

Quantitative performance of parametric methods
For both tracers, all parametric methods showed a higher A burden (as shown by DVR or SUVR values) for the A -positive group ( p < 0.05). Nevertheless, RPM and SRTM2 derived DVR and SUVR yielded highest area under the curve (AUC) values, as shown in Table 2 . As expected, for both tracers there were no significant differences in R 1 between A groups and AUC values of RPM and SRTM2-derived R 1 were low ( Table 2 ).
With respect to [ 18 F]flutemetamol, linear regression analyses yielded high correlations ( R 2 ≥ 0.87) between outcome measures from all methods and regional standard derived DVR, except for MRTM2 where the correlation was R 2 = 0.71 ( Table 3 and Fig. 2 A). Furthermore, compared with the regional standard, MRTM2 derived DVR showed a constant, average overestimation of 7%, as shown by the Bland-Altman analyses ( Fig. 3 A). On the other hand, SUVR and parametric RLogan derived DVR showed a bias of 10 and − 5% compared with the regional standard, which was proportional to the underlying amyloid burden (SUVR slope = 0.49, p < 0.001, parametric RLogan slope = − 0.20, p < 0.001) ( Fig. 3 A). In addition, proportional bias was also observed for RPM and SRTM2 (RPM slope = 0.23, p < 0.001 (SRTM2 slope = 0.15, p < 0.001). Furthermore, Bland-Altman analyses showed most variability for MRTM2 and RPM derived DVR and SUVR ( Fig. 3 A).

Discussion
In this study, various parametric methods for voxel-wise analysis of dual-time window [ 18 F]flutemetamol and [ 18 F]florbetaben studies were evaluated, considering the following three different aspects: 1) visual assessment of parametric images 2) ability to differentiate between A -positive and A -negative scans 3) quantitative accuracy and precision. All evaluated methods could differentiate between A -positive and A -negative scans based on DVR or SUVR measures, and they all showed high correlations with regional standard derived DVR, except for MRTM2. However, the quality of parametric images and the level Regional RLogan was used as independent variable, and linear regression analyses were corrected for age, sex and visual status.
of bias compared with the regional standard varied substantially across methods. Hence, these findings demonstrate that the most appropriate parametric method depends on the research or clinical question to be addressed. For both tracers, high correlations ( R 2 ≥ 0.87) were observed between DVR derived using the regional standard and the outcome measure from all parametric methods, except for MRTM2, which had relatively lower correlations ([ 18 F]flutemetamol: R 2 = 0.71 and [ 18 F]florbetaben: R 2 = 0.80) and showed a constant, average DVR overestimation of 7% for [ 18 F]flutemetamol and 11% for [ 18 F]florbetaben compared with the regional standard. It should be noted, that for interpreting the bias or accuracy of the results, it is important to take into account whether the bias is proportional to the underlying A burden. In contrast to a constant bias, proportional bias cannot easily be accounted for in longitudinal studies and thereby pose challenges, in particular when aiming to measure small changes over time. In that regard, while small (max. 5%) on average (except for [ 18 F]flutemetamol SUVR), the bias observed with the other methods was proportional to the underlying A burden, especially in the case of SUVR, RPM and RLogan. For SRTM2 specifically, a small proportional bias was observed for [ 18 F]flutemetamol, while a constant, average underestimation of 7% was seen for [ 18 F]florbetaben.
The amyloid dependent bias, as reported above, could be explained by a variety of factors. One possible explanation could be the assumption of single-tissue compartment kinetics in both the target and reference tissues, which is assumed for RPM and SRTM2, but not for MRTM implementations ( Gunn et al., 1997 ;Lammertsma and Hume, 1996Wu and Carson, 2002-, ( Ichise et al., 2003. In fact, it has been reported that for these tracers the reference tissues' kinetics are better described by a 2TC model ( Becker et al., 2013, Nelissen et al., 2009, possibly explaining the amyloid dependent bias for these methods, as demonstrated by Salinas and colleagues ( Salinas et al., 2015 ). This assumption may be particularly problematic for regions with high A burden, explaining the increase in bias as illustrated by Fig. 3 . With respect to RLogan, the voxel-wise method showed marked differences with its regional implementation counterpart. Since all other aspects (e.g. linearization start time, method of interpolation for missing data-points and omission of k 2 ') were identical between both implementations, these differences should primarily be related to a general impact of noise on parametric imaging methods. In particular, it has been shown that underestimation of the amyloid burden with RLogan-derived DVR is more pronounced at higher noise levels and for regions with higher binding ( Slifstein and Laruelle, 2000 ). Therefore, the parametric RLogan bias dependency on underlying A burden is plausibly a result of the interaction between higher noise levels and a higher A burden ( Slifstein and Laruelle, 2000 ).
In general, the present results are well in line with previous amyloid PET studies, although for some methods, such as RPM and SUVR in case of [ 18 F]flutemetamol, the bias was larger than previously reported ( Heurling et al., 2015 ). For example, previous work has demonstrated an overestimation of SUVR compared with BP ND or DVR obtained using plasma input modelling (the gold standard) ( Berckel BNM van et al., 2013 ;Verfaillie et al., 2021 ;Heurling et al., 2015 ;Becker et al., 2013 ;Ottoy et al., 2017. Since the reference method used in the present study (regional RLogan) is known to underestimate true binding ( Berckel BNM van et al., 2013 ;Verfaillie et al., 2021 ;Heurling et al., 2015 ;Heeman et al., 2020 ), this could explain the larger levels of bias observed in SUVR and RPM, compared with those reported by Heurling and colleagues ( Heurling et al., 2015 ). However, given that the regional implementation of RLogan is less sensitive to noise compared with its parametric counterpart, and the A burden covered by the CU participants is expected to be slightly lower than what is typically observed in AD patients, limited impact on the results was expected. With respect to both quantitative and visual performance of the various MRTM implementations, considerable between-tracer differences have been reported previously ( Yaqub et al., 2008 ;Verfaillie et al., 2021 ;Heurling et al., 2015 ;( Becker et al., 2013 ). These differences in performance may be explained by the application (or lack) of pre-or post-reconstruction smoothing filters and differences in scanner resolution ( Yaqub et al., 2008 ;Verfaillie et al., 2021 ;Heurling et al., 2015 ;Becker et al., 2013 ). Finally, small differences in the performance of parametric methods compared with previous reports can also be attributed to specific choices in processing and analysis pipelines, such as differences in SUVR uptake times, both starting and boundary values of fitting parameters, reference tissue selection and/or the inclusion of subcortical regions ( Heurling et al., 2015 ;Becker et al., 2013 ). Of note, for a few data points, large differences were observed between MRTM2 and the regional standard in case of [ 18 F]flutemetamol ( Fig. 3 A). All these data points belonged to a single subject and no large differences were observed for any of the other methods. Unfortunately, there is no obvious reason why data for this subject were different only for MRTM2.
Overall, the aim of this study was to evaluate different parametric methods on three different aspects. Regarding the first: visual assessment of parametric images , few image artefacts were observed for RLogan, RPM, SRTM2, SUVR and MRTM0, facilitating visual assessment. Furthermore, with respect to the tracer distribution, RPM and SRTM2, followed by MRTM0 images, appear most suitable. On the other hand, SUVR images showed high uptake in grey as well as white matter regions, increasing the risk of false positives, while RLogan images showed clearest underestimation of the A burden, increasing the risk of false negatives, especially for [ 18 F]flutemetamol ( Fig. 1 ). Therefore, RPM and SRTM2 appear to be the methods of choice for visual assessment, followed by MRTM0. The second aspect was the ability to differentiate between A -positive and A -negative scans . For both tracers, all methods could detect significant differences between A -positive and A -negative scans based on DVR or SUVR measures and were therefore considered suitable for this purpose. However, RPM, SRTM2 and SUVR provided highest AUC values, making these the preferred methods for detecting more subtle differences in A burden, followed by MRTM0 and MRTM1. Regarding the third aspect, quantitative accuracy and precision , methods should show a high correlation and low bias (independent of underlying A burden) compared with the regional standard and low measurement variability. With respect to the first characteristic, all methods, except from MRTM2, would be considered appropriate, as all of them showed excellent correlations with the regional standard ( R 2 ≥ 0.87). Regarding the second, only MRTM0 and MRTM1 showed both minimal and constant bias ( < 5%), while all other methods showed  the absolute difference between outcome measure of each parametric method and regional RLogan DVR for all regions of interest (ROIs). ROIs are colourcoded based on the A status of the scan they belong to, i.e. A -positive = red, A -negative = blue. The dashed horizontal line corresponds to the average bias, the dotted horizontal lines correspond to the upper and lower limit of the 95% Limits of Agreement for all scans and the solid line to the linear regression of the BA data-points. * * * p < 0.001, * * p < 0.01, * p < 0.05. a larger and/or proportional bias to a certain degree. Finally, lowest variability, as shown by the Limits of Agreement of the BA analyses ( Fig. 3 ), was observed for MRTM0 and MRTM1, followed by parametric RLogan, SRTM2 and RPM, in case of [ 18 F]florbetaben. Therefore, MRTM0 and MRTM1 appear to be the most suitable methods with respect to accuracy and precision.
Of the preferred methods from the first and second aspect, RPM and SRTM2 have the additional benefit of providing a measure of relative perfusion ( R 1 ) compared with the MRTM implementations. In the present study, R 1 yielded only low AUC values for differentiating between A -positive and A -negative scans. However, this finding was expected, as a cognitively unimpaired population was studied and changes in neuronal function and CBF tend to manifest at a later stage in the disease than A accumulation ( Jack et al., 2010 ).
A limitation of this work is the lack of a direct comparison with the gold standard, plasma input modelling, to determine quantitative accuracy and precision, which could be done by future studies. No large differences would be expected, as several studies have shown that reference tissue approaches are adequate for quantification compared with the (plasma input) gold standard ( Verfaillie et al., 2021 ;Heurling et al., 2015 ;Price et al., 2005 ). In addition, good results have been reported previously for RLogan in terms of correlation and bias compared with the gold standard and its robustness against confounding factors ( Heeman et al., 2021 ;Nelissen et al., 2009 ). Furthermore, the performance of the parametric methods was evaluated using cognitively unimpaired subjects only. To cover a range of A burden, 50% A -positive and 50% A -negative subjects were selected. Nonetheless, the upper end of this range may be slightly lower than what is typically observed in AD dementia patients. It should be noted that, as the subjects scanned according to a dual-time window protocol will unlikely include patients with severe cognitive impairment (given that this is not the target population for clinical trials or studies measuring disease progression), the present range of A burden was considered appropriate for the goal of this study.

Conclusions
The preferred parametric methods for voxel-based amyloid quantification of [ 18 F]flutemetamol and [ 18 F]florbetaben dual-time window studies differed per evaluated aspect, but were relatively comparable between tracers. Compared with the reference standard method, regional RLogan, the best compromise across aspects was found for MRTM0, followed closely by SRTM2 which has the advantage of also providing R 1 . Given the current interest in R 1 , SRTM2 would be the preferred parametric method.

Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: this project received funding from the EU/EFPIA Innovative Medicines Initiative (IMI) Joint Undertaking (EMIF grant 115372) and the EU-EFPIA IMI-2 Joint Undertaking (grant 115952). This joint undertaking receives support from the European Union's Horizon 2020 research and innovation program and EFPIA. This communication reflects the views of the authors and neither IMI nor the European Union and EFPIA are liable for any use that may be made of the information contained herein. FB is supported by the NIHR biomedical research centre at UCLH. Juan D. Gispert holds a 'Ramón y Cajal' fellowship (RYC-2013-13054) from the Spanish Ministry of Science, Innovation and Universities. The funding sources were not involved in any of the following aspects of the present study: design, data collection, analyses or interpretation, writing of the report, decision to submit the article for publication.