Adapting UK Biobank imaging for use in a routine memory clinic setting: The Oxford Brain Health Clinic

Highlights • We adapted UK Biobank brain MRI imaging for routine use in memory clinic.• The acquisition protocol is well tolerated and provides high-quality data.• The modified analysis pipeline improves grey matter and hippocampus segmentation.• Volumes are aligned with radiology reports and associated with age and cognition.• Data from older healthy controls are needed to improve reference distributions.


Introduction
Brain magnetic resonance imaging (MRI) plays a key role in the diagnosis and evaluation of patients suspected of having dementia (Filippi et al., 2012;Staffaroni et al., 2017;Vernooij et al., 2019). Atrophy, specifically medial temporal lobe atrophy, has been included in the diagnostic guidelines for Alzheimer's since 2011 (McKhann et al., 2011), and distinct atrophy patterns can point towards a specific underlying diagnosis of dementia subtype (Frisoni et al., 2010;McKhann et al., 2011;Staffaroni et al., 2017). More recently, there has been increasing awareness of the importance of vascular pathology as another key threat to cognitive health (Dickie et al., 2018;Gorelick et al., 2011;Smith et al., 2019).
Despite our understanding of structural changes in dementia, neuroimaging modalities are under-exploited in the clinical setting. In the UK, most people with memory problems (above 65 years old and with normal presentation) are referred to psychiatry-based memory clinics. These services typically do not have access to the same assessments (MRI and neuropsychology) as more specialist neurology-based clinics. Rather than to diagnose dementia, CT is most commonly used to exclude other pathologies, which account for a small proportion of cases (Scheltens et al., 2002;van Straaten et al., 2004). Specific atrophy patterns and the presence of vascular pathology are better assessed with MRI compared with CT (Scheltens et al., 2002;van Straaten et al., 2004).
The results of brain scans are usually reported by a radiologist without the benefit of a standardised dementia-specific structure, leading to highly variable content across patients. Consortia and working groups are making an effort to standardise the structure of radiology reports, include information about vascular pathology, and often suggest the use of semi-quantitative scales (Filippi et al., 2012;Smith et al., 2019;Vernooij et al., 2019). Including quantitative measures extracted from MRI in radiology reports has the potential to increase the accuracy of dementia diagnosis and prognosis (Bosco et al., 2017;Vernooij et al., 2018), reduce inter-rater variability, and improve workflows (Goodkin et al., 2019). However, despite being widely used in dementia research, MRI and quantitative measures derived from MRI are not commonly used in routine clinical assessments.
In research settings, various neuroimaging software suites have been developed and widely implemented to derive quantitative measures from brain MRI (e.g. cortical and hippocampal volume, volume of white matter hyperintensities -WMHs). More recently, with imaging now being more regularly included in large population cohorts, there are efforts to standardise these measurements and make them interpretable by non-imaging experts to maximise the usability of brain health information. A good example is the UK Biobank (UKB) study, which will ultimately include multi-organ (including brain) imaging for 100,000 participants. This can be combined with lifestyle, health and genetic data to produce predictive models for late life brain health (Littlejohns et al., 2020;Miller et al., 2016). The brain MRI UKB image processing pipeline (Alfaro-Almagro et al., 2018) automatically extracts thousands of measurements, so-called imaging derived phenotypes (IDPs). The pipeline uses tools largely from the FSL and FreeSurfer software libraries, which are used by thousands of research laboratories worldwide. However, these tools have primarily been validated only in a research context. The applicability of these tools and the UKB pipeline to an unselected clinical population remains unclear.
The Oxford Brain Health Clinic (BHC)(O'Donoghue et al., 2022a) aims to address this disconnect between the research and clinical domains, whilst providing an ideal clinical setting to validate MRI analysis tools. A joint clinical-research service that opened in August 2020, the BHC augments current NHS psychiatry-led memory clinic services to give patients and clinicians access to high-quality assessments not routinely available, including MRI instead of CT. MRI sequences used in the BHC are identical to, or compatible with, the UK Biobank imaging study (Miller et al., 2016), enabling us to explore the alignment of BHC results with the larger UKB imaging dataset.
Our ambition to incorporate high quality MRI acquisition and analysis is shared by a number of centres of excellence, research studies, and by a growing number of clinical research organisations and spinouts. Notable examples are the Quantitative Neuroradiology initiative (Goodkin et al., 2019) and the work by Biondo and colleagues (Biondo et al., 2022), using routine T1-weighted images from memory clinic patients in South London and Maudsley. A number of multi-site initiatives are also in progress (e.g. the Cambridge-led Quantitative MRI in NHS Memory Clinics study). The potential commercial opportunity is being exploited by a growing number of companies (e.g. Icometrix, Brainminer, Oxford Brain Diagnostics, Cortechs.ai, AINOSTICS, just to name a few). The specific contribution of the Oxford BHC relates to the population (non-selected clinical referrals from a psychiatry-led service), the research consenting model (patient choice with a very high consent rate) and the adaptation of the UK Biobank MRI acquisition and analysis methodology, which directly aligns this late-life clinical population with the largest epidemiological imaging study in the world. The population and consenting model are described elsewhere (O'Donoghue et al., 2022a), while this paper focuses on the methodological challenges of aligning brain MRI data between BHC and UKB. In particular, we sought to address three questions: 1) is the UKB acquisition protocol well tolerated by patients? 2) is the UKB pipeline robust and generalisable to a real-life memory clinic population? 3) can UKB data be used as a reference population for memory clinic patients?
In this work we describe how we have adapted the UKB protocol for use in the BHC and evaluated the tolerability of the BHC protocol for memory clinic patients both in terms of compliance and image quality. We assess the performance of the UKB analysis pipeline and optimise the necessary automated segmentation tools for use with this patient group. Finally, we compare the characteristics of BHC patients with UKB participants and discuss challenges and opportunities of using UKB as normative distributions and implementing quantitative measures in BHC reports.

Patient population
Patients from Oxfordshire who have been referred to Oxford Health NHS Foundation Trust Memory Clinics are triaged by the duty psychiatrist (RM) for referral to the Brain Health Clinic for assessments prior to their memory clinic appointment. Selection at the triage stage is based on clinical need and MRI safety screening. The GP referral and notes are examined for evidence of MRI safety contraindications and the duty psychiatrist speaks to the patient by phone to assess suitability for scanning. There is no explicit age cut-off for a BHC referral, but a clinical decision is made about the benefit of a BHC visit for patients who are deemed too frail for advanced scanning or where dementia is well established.
At the BHC, patients undergo extensive brain health assessments, including cognitive assessments, physiological measurements, questionnaires, and a 3 T MRI scan.
Patients are also offered three ways to take part in research: 1) to share their clinical data for research use, 2) to undergo additional assessments during their visit and 3) to be informed about future opportunities to participate in studies and trials. All research is optional, and patients that choose not to take part in research still complete the NHS assessment at the BHC. The BHC Research Database was reviewed and approved by the South Central -Oxford C research ethics committee (SC/19/0404).
For more details on the non-imaging assessments performed and the research consent process, please refer to (O'Donoghue et al., 2022a).

BHC MRI protocol
The UKB brain MRI protocol was optimised to produce high-quality images in as short a time as possible, to accommodate the large N (Miller et al., 2016;, making it an ideal candidate as a clinical protocol. The BHC brain MRI protocol was designed to match as closely as possible the UKB, with some adaptations intended to make the scan more tolerable for memory clinic patients and to prioritise the collection of images that are currently most useful to aid dementia diagnosis. Table 1 illustrates the BHC MRI protocol compared with the UKB protocol. The full protocol details are openly available on the WIN MR Protocols Database (O'Donoghue et al., 2022b).
The scanner used in UKB is a Siemens Skyra 3 T running software platform VD13A, with a 32-channel RF receive head coil. The BHC protocol was setup on a Siemens Prisma 3 T running VE11C, with a 32channel head coil. The Prisma is a slightly newer, higher-specification scanner than the Skyra and is running a newer software platform, hence is able to run the UKB sequences and protocol.

Subdivision into clinical and research sequences
Of the MRI modalities included in the UKB protocol, some are routinely used in clinical settings, while others are currently used mainly for research purposes.
Leveraging the joint clinical-research nature of the BHC we divided the UKB protocol into two sets of sequences: a core clinical protocol including the sequences that are compatible with current radiological examinations of memory clinic patients, and a research protocol that patients can opt-in to receive if they consent to additional research assessments.

BHC core clinical protocol
The core clinical protocol has a total acquisition time of 16:29 mins and includes a T1-weighted scan, a T2-weighted Fluid Attenuated Inversion Recovery (FLAIR) scan, a diffusion-weighted scan (dMRI) and a susceptibility-weighted (swMRI) scan.
The T1 and T2-FLAIR scans are exactly matched with UKB protocol. These high-resolution structural scans (1 mm isotropic) allow clear depiction of brain anatomy, with high contrast between grey and white matter (T1) and highlight alterations to tissue compartments typically associated with pathology (T2-FLAIR). Neither the BHC nor the UKB protocol include a T2-weighted (non-FLAIR) scan. This decision was a compromise between scanning time and clinical utility. T2-FLAIR and T2-weighted are very similar sequences, both in contrast and scan duration. T2-FLAIR offers better contrast than a T2-weighted scan for white matter pathology and infarcts, which are very important in the context of dementia diagnosis. The utility of T2-weighted scans would be more for non-neurodegenerative pathology and infratentorial or thalamic infarcts. In order to maximise the amount of clinically relevant information in a short scan, the preference was to include a T2-FLAIR and not a T2-weighted scan.
We then acquire a short dMRI scan (43 s) with just 3 orthogonal diffusion directions, as commonly done in clinical practice to evaluate mean diffusivity and detect areas of restricted diffusion for the assessment of ischaemic injury or prion disease.
The susceptibility-weighted sequence was modified from the UKB protocol to obtain higher resolution images (1.5 mm slice thickness vs 3 mm in UKB protocol), useful for the assessment of venous vasculature, microbleeds or aspects of microstructure (e.g., iron, calcium and myelin).

BHC research protocol
The research protocol includes the UKB dMRI and resting state functional MRI (rfMRI) sequences, and an arterial spin labelling (ASL) sequence. The acquisition time for these optional research sequences is 21:17 mins.
The UKB dMRI sequence includes 100 diffusion-encoding directions across 2b-shells (50x b = 1000 s/mm 2 , 50x b = 2000 s/mm 2 ) with multiband acceleration factor of 3. With respect to the clinical 3-scan dMRI, this sequence enables finer measurement of the random motion of water molecules to infer information about white matter (WM) microstructural properties and delineate the gross axonal organisation of the brain.
The UKB rfMRI sequence includes 490 timepoints (volumes) acquired with 2.4-mm isotropic spatial resolution and TR = 0.735 s, with multiband acceleration factor of 8. Resting-state functional MRI measures changes in blood oxygenation associated with intrinsic brain activity (i.e., in the absence of an explicit task or sensory stimulus). During resting-state scans, subjects are instructed to look at a crosshair, blink normally and try not to fall asleep.
We did not include the task fMRI sequence from UKB as it was not designed for this type of population. Instead, given the impact of vascular risk factors and vascular pathology on dementia (de la Torre, 2012;Iturria-Medina et al., 2016), it was deemed particularly important to look at vascular brain health in the BHC. We therefore included an Arterial Spin Labelling (ASL) sequence to look at brain perfusion, preceded by a time of flight (TOF) acquisition to localise neck vessels. We used a 2D-multi-slice EPI readout, with pseudo-continuous ASL tag duration of 1400 ms, seven post labelling delays (PLDs = 250,500…  1750 ms), seven label/control pairs at each PLD, and one calibration image. During these scans, subjects are also instructed to look at a crosshair, blink normally and try not to fall asleep. The original UKB protocol did not include ASL, however this modality has been recently collected on the UKB COVID-19 study participants.

Raw scan quality control
We visually inspected the raw images to identify low quality scans to give an indication of tolerability of the protocol (e.g., motion artifacts that would be indicative of the ability of a memory clinic patient to lay still in the scanner), as well as to identify scans that might be informative about robustness in evaluating the analysis pipeline.

Radiology reports
All clinical scans are reported by an experienced radiologist (PP) in a standardised way, following the guidelines that emerged from a survey of the European Society of Neuroradiology (ESNR) . Table 2 describes the fields of the structured report, as well as the variables that were derived from it for the patients who consented to the use of clinical data for research.

BHC MRI analysis: Adapting the UKB pipeline to memory clinic population
The second component of UKB imaging implemented in the BHC is the image analysis pipeline. The pipeline (Alfaro-Almagro et al., 2018) automatically processes the images and extracts imaging derived phenotypes (IDPs), standardised quantitative measures that can be easily used and interpreted also by non-imaging experts. Because of the wellmatched acquisition protocol, we were able to apply the pipeline, normally used only in research, to extract quantitative information from clinical scans of BHC patients.
In this work we focused on obtaining the most clinically useful IDPs for dementia diagnosis (i.e. those relevant to the sections included in the standardised radiology report), from amongst the IDPs currently generated by the pipeline (Alfaro-Almagro et al., 2018). We therefore analysed T1-weighted scans and T2-FLAIR scans and extracted measures of global atrophy (total grey matter volume), hippocampal atrophy (hippocampal volume) and white matter change (volume of periventricular and deep WMHs).
We then tested whether the UKB pipeline applied to a real-life memory clinic population was successful in extracting the measures of interest and, where necessary, performed the necessary adjustments to the analysis pipeline.

Image processing with UKB pipeline
Structural images were analysed with FSL using the UKB analysis pipeline (version 1.5 -https://git.fmrib.ox.ac.uk/falmagro/UK_bioba nk_pipeline_v_1.5) (Alfaro-Almagro et al., 2018). In brief, the main steps of the T1 pipeline that derive the selected IDPs are: gradient distortion correction and defacing, brain extraction, linear and nonlinear registration to standard space, FIRST subcortical structure segmentation (Patenaude et al., 2011), FAST bias field correction and tissue-type segmentation (Zhang et al., 2001) along with a shortened version of SIENAX reliant on FAST (Smith et al., 2002). The T2-FLAIR pipeline includes bias field correction, registration to T1 and standard space, brain extraction by masking with the T1 brain mask, and WMH segmentation with BIANCA , further subdivided into periventricular and deep WMH .

UKB pipeline output quality check and optimisation
For the first 32 patients, visual inspection in FSLeyes was performed for the following stages in the UKB structural pipeline: brain extraction of T1 and T2-FLAIR scans, tissue-type segmentation from FAST/SIENAX (with particular attention to grey matter segmentation), hippocampus segmentation output from FIRST and WMH segmentation output from BIANCA. Each stage was rated as high, medium, or low quality by two raters independently (LG, GG), and the final rating was reached through consensus. Notes were included on the type of inaccuracy if present. These quality checks informed selection of the tools requiring optimisation.
Because this patient population is characterised by increased cortical and medial temporal lobe atrophy and higher white matter hyperintensities load, we identified two analysis steps that required optimisation (Fig. 1).
Grey matter (GM) segmentations (from FAST/SIENAX) were often flagged as inaccurate due to WMHs being misclassified as GM, causing total GM volume to be overestimated. This is due to the fact that WMHs that are bigger and likely at a more severe stage become visible on T1weighted scans as hypo-intensities (Melazzini et al., 2021). To improve the segmentation accuracy, we implemented in the pipeline a modified version of SIENAX (-lm option). With this option WMHs (segmented with BIANCA) are initially excluded from tissue-type segmentation, then added to the final white matter map (since WMHs are, by definition, part of the WM).
Hippocampal segmentations from FIRST were sometimes flagged as inaccurate in the quality checks. Since by design FIRST does not explicitly avoid the inclusion of non-grey matter tissues within its boundaries, one common error in this population with larger amounts of medial temporal lobe atrophy was the inclusion of cerebrospinal fluid (CSF) regions in the hippocampus. We therefore attempted CSF-masking to improve segmentation accuracy. In brief, this approach takes the CSF partial volume estimate (PVE) map generated by FAST, thresholds it to keep only voxels with high CSF content, and removes CSF areas from the FIRST hippocampal segmentations. The threshold for the CSF PVE map was empirically chosen as the value giving the best trade-off between removing CSF voxels from the hippocampus mask and avoiding undersegmenting the rest of the structure. Optimised tissue-type and hippocampus segmentations were compared to the initial segmentations in FSLeyes and rated as high, medium, or low quality by the same raters (LG,GGfinal rating reached through consensus). The most accurate segmentation strategies (FAST/ SIENAX with or without lesion-masking and FIRST with or without CSF masking) were carried forward to subsequent analyses.

Optimised pipeline validation
We performed both an internal and external validation of the optimised analysis pipeline. Prior to any analyses, all IDPs were normalised for head size using the SIENAX scaling factor to correct for this betweensubject variability. IDPs that were not normally distributed (Kolmogorov-Smirnov test) were cube-root transformed. The cube-root transformation was chosen as it is a transformation that produces linear units (i.e. mm instead of mm 3 ), and which generally produces a transformed variable with a symmetric distribution.
The internal validation was performed by examining the level of agreement between the IDPs extracted from the pipeline with visual ratings from radiology reports. We used ANOVA tests to compare the total GM volume to the global cortical atrophy scale, the left and right hippocampal volumes obtained from FIRST with left and right medial temporal lobe atrophy (MTA) scale (Scheltens et al., 1992), and total WMH, PWMH and DWMH volumes from BIANCA against the Fazekas scale (Fazekas et al., 1987). Where a visual rating score had <5 cases (usually lowest or highest scores), they were grouped with the nearest score for the statistical analyses.
External validation was then performed against non-imaging variables. Based on their known associations with WMHs and atrophy, three clinical variables were used for external validation of tool performance: patient age, the Addenbrooke's Cognitive Examination-III (ACE-III) (Hsieh et al., 2013) total score and the ACE-III memory sub-score. Among the tests used in memory clinics (Ballard, 2013), the ACE-III was chosen because it provides a comprehensive assessment of five cognitive domains (attention, memory, language, verbal fluency and visuospatial function) and showed higher accuracy for the diagnosis of mild Alzheimer's disease than other screening tools (Matias-Guiu et al., 2017). Partial correlation (correlation with age controlled for sex, correlation with ACE-III and ACE-III memory score controlled for age and sex) was used to test the relationship between these variables and GM volumes, FIRST hippocampal volumes, and BIANCA WMH volumes, under the hypothesis that strong correlations would indicate high tool performance.
All statistical analyses were performed in SPSS 27 (IBM) and Bonferroni-corrected for multiple comparisons.

Towards quantitative radiology reports for the BHC
Finally, we explored the use of UKB data as a reference population to aid derivation of individual prediction on BHC patients. This would allow incorporating quantitative measures into radiology reports, delivering the enhanced brain information through a decision support tool that is interpretable and useful to clinicians and patients. To this aim, we compared the characteristics of BHC patients with those of the UKB participants, both in terms of demographics and hippocampal volume using previously published nomograms derived from 19,793 generally healthy UKB participants (Nobis et al., 2019).

Sample characteristics
From the opening of the BHC in August 2020 to November 2021, 108 patients attended their BHC appointment (O'Donoghue et al., 2022a)). As shown in Fig. 2, 92.6 % (N = 100) completed the clinical scans (2 patients were not scanned due to inability to lie in scanner, 2 had safety contraindications on the day, 1 was claustrophobic, and 3 scans were abandoned due to claustrophobia and discomfort in the scanner). Of these, 95 (88.0 %) patients consented to the use of their clinical MRI data for research and are included here. Sixty-nine patients (63.9 %) consented to undergo additional research scans, and forty-seven (43.5 %) completed all research scans. Table 3 includes the characteristics of the 95 patients included, a summary of the findings on the radiology reports and the number of scans available for each modality. Quality checks of raw T1-weighted and T2-FLAIR images overall show that the UKB MRI sequences were suitable for this clinical population (93 % of T1 scans and 94 % of T2-FLAIR scans were high-or medium-quality, Fig. 2). The most common quality issues were motion artifacts.

Radiology reports
A summary of the main findings from the radiology reports is provided in Table 3.
We observed a level of generalised atrophy that was mostly mild or moderate. Of the three severe cases, two had an atrophy level that was reported as moderate to severe, and one had severe asymmetrical atrophy of the temporal lobe with relative sparing of the rest of the brain. Given this small number of severe cases, in subsequent analyses we merged the moderate and severe global cortical atrophy classes into one category (moderate/severe).
The level of hippocampal atrophy was overall moderate (median MTA = 2). The amount of white matter hyperintensities was predominantly moderate in the periventricular areas and mild in the deep white matter. Fig. 3 summarises the levels of hippocampal atrophy (MTA score) and white matter hyperintensities (Fazekas score) for each diagnostic group.
Three patients had a previous haemorrhage, and 8 patients had a previous infarct (one acute, seven chronic). Of the ten patients presenting microhaemorrhages, in four cases there was a single lesion reported, in three cases there were two lesions, and in the remaining cases there were several lesions (more than five). A mass was reported for six patients: in 3 cases it was a cyst, 2 cases of meningioma, and 1 cavernous haemangioma. No cases of prion disease, extra-axial collection or hydrocephalus were reported.

BHC MRI analysis: Pipeline optimisation and validation
The automated analysis pipeline failed on one participant (unsuccessful T1 brain extraction and registrations, preventing further analyses on other modalities), due to high levels of motion in the T1 scan. The scan was also marked as low-quality image in the raw data QC and the radiology report included a note that images were degraded by movement artifacts (the radiologist was still able to provide a report). For the remaining 94 patients, a useable output was produced, including those for which structural abnormalities were reported (e.g., infarcts or masses). Results are therefore reported for 94 patients. Table 4 shows the results of the visual check on the pipeline outputs on the first 32 patients before and after optimisation. The low number of high-quality tissue-type and hippocampal segmentations with the default pipeline prompted the need for optimising these processing steps. Although several WMH segmentations were rated of medium quality, the most common errors were small false positive clusters in the cortex and overestimation of the lesion size in cases with low WMH load. While there is still room for improvement in WMH segmentation, this was not set out as a priority, as the inaccuracies would not have a big impact on the total volume (and on the lesion masking procedure used to improve tissue-type segmentation). Similarly, the other steps of the pipeline (brain extraction and registration) were deemed of high or medium quality, with small inaccuracies unlikely to significantly affect the calculation of most IDPs.

UKB pipeline output quality check and optimisation
After implementing the two optimisation strategies, we observed that the quality of the results improved in both cases; lesion masking significantly improved the quality of tissue-type segmentation, while CSF masking only led to a modest improvement of hippocampal segmentation (with a threshold of the CSF PVE map of 0.7 empirically chosen from a range between 0.6 and 0.9). The modifications to the pipeline were deemed successful and volumes extracted with the optimised pipeline were used in further analyses. The presence of other structural abnormalities caused minimal or no error in the segmentations, not influencing the overall quality of the results and estimated IDPs. Fig. 4 shows the results of the internal validation of the optimised pipeline. Visual rating scores derived from the radiology reports for atrophy and white matter hyperintensities are compared with the corresponding IDPs obtained with the automated pipeline (all normalised for head size, WMH volumes additionally cube-root transformed before entering statistical analysis).

Optimised pipeline validation
The ANOVA tests showed that there was a statistically significant difference in total GM volume between the different atrophy categories  Table 5 reports the correlations between the IDPs and age (controlling for sex), total ACE III score and ACE III memory sub-score (controlling for age and sex), performed as an external validation of the optimised pipeline.
Total GM volume was significantly negatively correlated with age and positively correlated with ACE-III total and memory sub-score. Regarding hippocampal volumes, significant negative correlations were observed between age and both the left and right hippocampi. Left hippocampal volume was also significantly correlated with ACE-III scores (total and memory sub-score), while correlations between right hippocampal volume and ACE-III scores were not significant. We further verified that the association between hippocampal volumes and ACE-III score was significantly different between hemispheres (F(1,89) = 6.062, p = 0.015 on interaction between hemisphere and ACE-III score using a linear mixed modellmer package in R).
Greater WMH total, periventricular and deep volumes significantly  17 12 3 ---correlated with higher age and lower ACE-III total score. None of the correlations between WMH volumes and ACE-III memory sub-score reached Bonferroni-corrected significance. Fig. 5 shows how the BHC patients compare to 19,793 generally healthy UKB participants (Nobis et al., 2019). The age distribution of BHC and UKB participants is significantly different (BHC: 78.4 ± 6.2 yearsrange 66-101 years; UKB: 63.16 ± 7.50 yearsrange 45-81 years). Consequently, only a very small minority fall within the age range of the nomograms. For these patients, hippocampal volumes are overall comparable to those from UKB participants, with several patients falling in the lower percentiles.

Discussion
We have adapted the UK Biobank brain imaging for use in a realworld memory clinic setting, the Oxford Brain Health Clinic. We found that the BHC acquisition protocol was well tolerated by patients. The optimised analysis pipeline produced IDPs of total GM volume, hippocampal volume and WMH volume that were in good agreement with visual ratings from the radiology reports and correlated with factors known to be associated with atrophy and vascular pathology. The age difference between BHC patients and UKB participants highlighted the need for additional scans on elderly healthy controls to improve reference distributions.
By dividing the UKB brain MRI protocol into clinical and research sections we were able to exploit its technical advances to generate high quality data in a reduced time, while tailoring it to inform dementia diagnosis. The high overall quality of raw T1-weighted and T2-FLAIR scans suggests that these UKB-matched sequences are suitable for Fig. 4. Optimised pipeline internal validation. Values of IDPs (y axes) derived from the automated pipeline, all normalised for head size (WMH volumes were cuberoot transformed before entering statistical analyses) against the classifications (x axes) obtained from clinical radiology reports (blind to the results of the pipeline). Scores with<5 cases were grouped with the nearest score for statistical analyses (MTA 3 and 4; Total Fazekas 0 and 1, 5 and 6; PWMH Fazekas 0 and 1; DWMH Fazekas 0 and 1, 2 and 3). See Table 3 for original counts for each score and supplementary Fig. 1 for boxplots with original scores). Mean and standard deviation reported for head size normalised IDPs (all volumes in mm 3 ). Due to non-normal distributions, the WMH volumes were cube-root transformed before entering statistical analyses. Age correlation controlled for sex, ACE-III correlations controlled for age and sex. IDP = imaging derived phenotype; GM = grey matter; ACE-III = Addenbrooke's Cognitive Examination-III; WMH = white matter hyperintensities; PWMH = periventricular white matter hyperintensities; DWMH = deep white matter hyperintensities. * Significant after Bonferroni correction across 18 tests (p < 0.003).
patients with memory problems. Moreover, by combining clinical scans with optional research add-ons into a single session and enabling patients to opt for their desired level of research participation, this protocol reduces potential barriers to research engagement. Of the 100 patients who were able to complete the clinical scans, 95 consented to the research use of their data. By prioritising data collection with direct clinical benefit and enabling additional research participation, this approach exceeds the target outlined in the UK Prime Minister's Challenge on Dementia for 10 % of dementia patients to participate in research (UK Department of Health, 2012). This results in a dataset that is representative of the memory clinic population, as the patients are not selectively recruited for a research study. Because patients can also consent to be recontacted for research (current consent rate 73.1 % (O'Donoghue et al., 2022a)), this protocol is also building a valuable participant database for future studies. There were a number of challenges to adapt the UKB analysis pipeline in this population. Tissue-type segmentation was highly affected by the presence of WMHs, misclassified as GM, due to their similar T1 intensity (Dadar et al., 2018;Melazzini et al., 2021). This inaccuracy has been reported elsewhere with multiple sclerosis (MS) lesions (Battaglini et al., 2012) and WMHs (Dadar et al., 2021). Lesion-filling (i.e. replacing or "filling" the intensity values in the lesion area with intensities that are similar to those in the non-lesion neighbourhood) and lesion-masking (the approach used in this study) are valuable correction tools for MS lesions (Battaglini et al., 2012;Chard et al., 2010). For WMHs of presumed vascular origin, we opted for the use of lesion-masking since WMHs tend to be larger and more confluent than the MS lesions for which lesion-filling was designed (Battaglini et al., 2012). Although alternative methods have been previously described to adjust for WMHs in GM segmentations (Park et al., 2018), lesion-masking provides a straightforward method yielding remarkable improvements in GM segmentation accuracy. Whilst essential in BHC patients with high WMH burdens, lesion-masking was not detrimental in patients with mild WMHs. Overall, the efficacy and simplicity of lesion-masking supports its widespread application in other pipelines and for other populations.
Hippocampal segmentation is notoriously problematic for patients with more atrophy. Previous studies showed better performance of FIRST in non-atrophic brains (Goubran et al., 2020), despite the inclusion of some Alzheimer's disease patients in its training datasets (Patenaude et al., 2011). Similar inaccuracies have been reported with other hippocampal segmentation tools (Firbank et al., 2008), supporting the widespread need for optimisation. CSF-masking, a novel strategy in this study, led to a modest improvement of the segmentations, despite very small impact on the calculated volumes. This may be because some segmentation inaccuracies also occur at the GM-WM interface, and future efforts should aim to also improve segmentation accuracy at this boundary. Another possibility is that FIRST segmentations may be inaccurate in shape or location but not necessarily in size. In fact, among the segmentations that were still labelled as low quality after CSFmasking, only one was an outlier in terms of volume. Nevertheless, further optimisation is required to improve the quality of these segmentations, especially for future analyses reliant on hippocampal shape (e.g., vertex analysis). Potential alternative approaches include those based on deep learning, like the one proposed by (Dinsdale et al., 2019), which also showed increased performance on Alzheimer's disease and mild cognitive impairment (MCI) scans after transfer learning (Balboni et al., 2022) or the one by (Liu et al., 2020), which incorporates hippocampal segmentation and Alzheimer's disease classification. It will be important to test these and other new approaches that might be included in future versions of the UKB pipeline in an unselected clinical population like the BHC. This demonstrates the value of sharing BHC data for the development of these methods.
From our internal validation we found good agreement between IDPs and visual ratings, suggesting that the pipeline can be used to automatically extract meaningful information for memory clinic patients. Our external validation strategy confirmed the well-known associations between age and both atrophy and vascular pathology. We also found that white matter hyperintensity volume was associated with impaired cognitive function, which is consistent with previous studies (Bolandzadeh et al., 2012;Griffanti et al., 2018;Kim et al., 2008). ACE-III scores (total and memory) were significantly associated with the left hippocampal volume, but not the right, with a significant interaction between hemisphere and ACE-III score in the linear mixed model. The left hippocampus has been previously found to be smaller and more strongly correlated with cognitive symptoms in MCI and dementia than the right hippocampus (Ezzati et al., 2016;Muller et al., 2005;Shi et al., 2009). Compared to the right hippocampus, we see smaller left hippocampal volumes with greater variability both before and after correction with CSF-masking (Table 5, paired t-test L < R hippocampus p = 0.003). This smaller variability may also limit our power to detect correlations with ACE-III total and memory scores on the right side.
Finally, we explored the alignment between the BHC and UKB populations using previously published nomograms for hippocampal volume (Nobis et al., 2019). There is a substantial age difference between UKB participants and BHC patients, meaning that UKB data are currently not suitable to be used as reference population for memory clinic patients. Nomograms used here were generated using 19,793 participants, although data from the latest release including 51,532 participants who underwent imaging (visit 2 -see supplementary Fig. 3) are very similar, with a mean age of 64.54 ± 7.81 and age range between 44 and 83 years. The use of a sliding window approach further restricts the age range, since the nomograms were designed so that each window contained 10 % of the participants (Nobis et al., 2019). Using alternative fitting methods can help building nomograms that cover a wider age range (Fraza et al., 2021), provided that the uncertainty, especially when fewer samples are available (e.g., either side of the age range), is carefully considered (Bozek et al., 2022). Recently, 'brain charts' of the four main tissue volumes of the cerebrum (total cortical GM volume, total WM volume, total subcortical GM volume and total ventricular CSF volume) have been generated from over 100,000 scans (including UKB) across the lifespan (Bethlehem et al., 2022). They include the age range of BHC patients, but being generated from multi-site data, they currently require at least 100 scans from healthy controls to estimate studyspecific offset. Moreover, these charts are not yet extended to more fine-grained IDPs that are likely to be more useful for dementia diagnosis, but also more sensitive in variations in image quality. Despite the current open challenges, the very active field of normative modelling research has the potential to ultimately enable the integration of IDPs and reference distributions in an imaging decision support tool for dementia diagnosis, the clinical utility of which can be directly assessed in the clinical setting at the BHC.
There are a number of methodological considerations when interpreting these results. Firstly, although the BHC population is nonselective and highly representative of the local memory clinic population, the local population of South Oxfordshire lacks the ethnic diversity of other areas of the country. We hope that by sharing our protocol, analysis pipeline and data as well as collaborating with other brain health clinic initiatives we will contribute to building a more representative dataset of memory clinic patients. Secondly, as described above, the UK Biobank is an impressively large, but nevertheless incomplete reference sample because the age range does not extend to cover that of the memory clinic population. The repeat imaging within the UK Biobank will go some way to address this, and we are currently acquiring additional scans on elderly healthy controls with the same protocol to improve reference distributions. Finally, while the analyses described here give confidence in the use of the UK Biobank acquisition and analysis protocols for the memory clinic population as a whole, we do not yet have an appropriate set of controls and sufficient numbers of patients in each distinct diagnostic group to perform group comparisons. The UKB pipeline produces thousands of IDPs and a sample size calculation will always depend on the specific research question. As such, we have not generated power calculations for individual comparisons. There is a literature on sample size calculations for hippocampal size (i. e. a single IDP), where power calculations and ROC curve analysis have been investigated formally. Very similar classification performance (controls vs Alzheimer's disease AUC = 0.85-0.9; controls vs MCI AUC = 0.7-0.8) has been obtained with a highly variable number of subjects (from<50 in each group to hundreds) (Colliot et al., 2008;Estevez-Sante et al., 2020;Jimenez-Huete et al., 2017;Landau et al., 2010;Voevodskaya et al., 2014), often from the same dataset (ADNI) (Estevez-Sante et al., 2020;Jimenez-Huete et al., 2017;Landau et al., 2010;Voevodskaya et al., 2014). The scope of this paper was to present the BHC acquisition protocol, the adapted analysis pipeline and the data, that we hope will contribute to a number of future studies.
To conclude, to the best of our knowledge, this is the first time that UKB imaging (acquisition, analysis pipeline and reference data) has been adapted and applied for an unselected real-world patient population. The UKB pipeline is regularly updated to include new contributions and technical advances in image processing. This could include the automatic detection and quantification of other brain characteristics that are currently assessed in the clinical setting for dementia (e.g., other signs of small vessel disease like lacunes or microhaemorrhages, other structural abnormalities like infarcts, masses or haemorrhages). Having access to data from a real-world memory clinic population helps inform these methodological developments at the same time as testing their clinical utility individually and in combination with other imaging and non-imaging variables (e.g. lifestyle factors, health measures and genetics), thus bridging the gap between research and clinical practice.
The MRI data presented in this paper will be available via the Dementias Platform UK (https://portal.dementiasplatform.uk/CohortDirectory/Item?fingerPrintID=BHC) and access will be granted through an application process, reviewed by the BHC Data Access Group. The BHC Data Access Group will start accepting applications to access BHC data upon publication of the present work. Data will continue to be released in batches as the BHC progresses in order to minimise the risk of participant identification.
The MRI acquisition protocol and the analysis pipeline code are openly available. The MRI data will be available via the Dementias Platform UK (details in supplementary material).