Systematic review and meta-analysis of automated methods for quantifying enlarged perivascular spaces in the brain

Research into magnetic resonance imaging (MRI)-visible perivascular spaces (PVS) has recently increased, as results from studies in different diseases and populations are cementing their association with sleep, disease phenotypes


Introduction
Perivascular spaces (PVS), also referred to as 'Virchow-Robin spaces' (VRS), are fluid-filled cavities surrounding the blood vessels of the brain.PVS are thought to facilitate the uptake of cerebrospinal fluid (CSF) and the removal of metabolic waste products from the brain, but the precise involvement in these processes in humans remains elusive (Wardlaw et al., 2020).PVS are increasingly visible in brain imaging or at autopsy as they become enlarged.Enlarged PVS are associated with increased age (Francis et al., 2019), other markers of small vessel disease (SVD) (Duering et al., 2023;Wardlaw et al., 2013), and may be more prevalent in individuals with vascular risk factors (e.g., hypertension) (Zhu et al., 2010).A high burden of enlarged PVS may be associated with worse brain health outcomes, including increased risk of vascular dementia (more so than Alzheimer's disease (AD) (Smeijer et al., 2019) and stroke (Debette et al., 2019).
PVS appear as ovoid or tubular structures, with bright (hyperintense) signal in T2-weighted (T2w) magnetic resonance imaging (MRI) and dark (hypointense) signal in T1-weighted (T1w) MRI, and are occasionally visible on fluid-attenuated inversion recovery (FLAIR) imaging (Duering et al., 2023;Wardlaw et al., 2013).Larger ovoid PVS may appear similar to lacunes-also round CSF-filled cavities and thus with similar signal characteristics to PVS (Duering et al., 2023;Valdés Hernández et al., 2013).PVS may also coalesce with one another forming clusters or irregular strings, exhibit tortuosity, have irregular diameters, and may traverse adjacent imaging slices, complicating their quantification when using multiple 2D slices from images with anisotropic voxels, commonly acquired in clinical practice.
Several automated and semi-automated computational pipelines and methods have been developed to quantify PVS.These methods are heterogeneous in terms of their implementation, validation, and application (Barisano et al., 2022a;Moses et al., 2023;Pham et al., 2022).The principles and drawbacks of some of these methods have been summarised previously (Barisano et al., 2022a;Moses et al., 2023;Pham et al., 2022), and suggestions about further requirements to increase the understanding of PVS by means of their computational assessment using MRI have been drawn (Pham et al., 2022).But there has been also a wealth of computational developments to either enhance performance of previously developed methods, establish their limits of validity, or reduce noise effects, or increase the accuracy of their output, which have not been summarised or reviewed.Neither has this wealth of information been systematically identified or meta-analysed.
The relevance of PVS has become increasingly evident in areas not studied before such as in relation to the occlusion of sinonasal cavities (Sáenz de Villaverde Cortabarría et al., 2023;Valdés Hernández et al., 2024b), in patients with the metabolic syndrome (Hayden, 2023), or in patients with amyotrophic lateral sclerosis (Schreiber et al., 2023) and, with it, the efforts in improving the accuracy of the currently accepted as state-of-the-art computational methods to assess them in the presence of confounding pathology such as white matter hyperintensities (WMH) or lacunes.Consequently, we systematically reviewed and meta-analysed the literature in an attempt to fill this void.In this systematic literature review of (semi-/fully-) automated quantification of PVS, we not only identify and meta-analyse the data from the PVS quantification methods that have been developed, and summarise how they have been applied in studies of PVS associations with health and lifestyle indicators, but also analyse the improvements to these methods, and the impact of the computational efforts around this topic.

Protocol registration
We registered this systematic review protocol with the International Prospective Register of Systematic Reviews (PROSPERO), registration number: CRD42022359951 (September 2022).We planned to metaanalyse estimates where methodological and clinical characteristics of contributing studies were similar.

Search strategy
We searched PubMed, Web of Science, and Google Scholar for literature published between January 1990 and September 2023.We also included any hand-selected papers identified from references in relevant literature reviews.Three reviewers (JMJW, MCVH, and JB) conducted the literature search.Each paper was assessed by two reviewers and discrepancies were resolved by discussion.After running pilot searches, we decided on the following search terms (Table 1).

Inclusion and exclusion criteria
We included studies that described the development, improvement, or application of computational (i.e., semi-/automated) PVS quantification methods in human brain MRI.
Exclusion criteria included records without accompanying full-text; studies not reported in English; animal model/pre-clinical studies; studies of PVS using only visual or manual (i.e., non-computational) PVS quantification; studies reporting computational lesion quantification that has not been applied to PVS; studies reporting results from the application of a computational PVS quantification methods but which did not cite or describe the PVS quantification method in-text or in a prior publication or provide any detail.Studies on non-parenchymal fluid dynamics and glymphatic clearance function metrics were also excluded.

Data extraction and risk of bias assessment
Six reviewers (JMJW, JB, RDC, RB, LB, and MCVH) extracted the data.To cross-check data entry, a reviewer (MCVH) performed double extraction independently and blind to extraction results.We grouped studies into 'method development' (i.e., the paper focuses primarily on the development and validation of a new proposal for identifying, quantifying, or segmenting PVS), 'improvement/validation' (i.e., the paper either provides evidence of a certain PVS quantification issue or proposes an approach to deal with it), and 'application' (i.e., the paper uses a previously presented approach to assess PVS to analyse results in relation to clinical data) categories, and extracted relevant data accordingly.For all studies, we extracted data relating to year of publication, authors, MR sequences/field strengths, population type and sample size.For 'method development' and 'improvement/validation' studies, we extracted data relating to pre-processing steps, reference standards, computational PVS method used, and accuracy and/or validation results.For 'application' studies, we additionally extracted more detailed information on population type, variables assessed in relation to PVS burden, and results.
We conducted a risk of bias assessment on included studies, based on the QUADAS-2 quality assessment tool (Whiting et al., 2011).Risk of bias plots were produced using the Risk of Bias Visualization tool (McGuinness and Higgins, 2020).

Statistical analysis
We conducted frequency analyses for the data extracted, and tabulated the results.For categorical variables (e.g., computational approach, MRI sequences, etc.), possible values were coded to enable the frequency analyses.Forest plots were generated using MedCalc (https ://www.medcalc.org/).For articles reporting correlation coefficients, fixed effects were determined using a Fisher Z transformation of these coefficients.The random effects model (DerSimonian and Laird, 1986) general estimate was given after analysing heterogeneity using the I 2 and Cochran's Q statistics (Higgins et al., 2003).We also used bar charts to facilitate comparability in the numeric results.

Search results
We identified 2978 potentially relevant records.Before screening, Table 1 Search strategy.
Grouping Terms #1 "Perivascular spaces" OR "PVS" OR "EPVS" OR "Virchow-Robin spaces" OR "VRS" #2 "Enlarged" OR "Dilated" OR "Visible" OR "Dysfunction" OR "Enlargement" OR "Visibility" OR "Dilation" #3 #1 AND #2 #4 Comput* OR "Segmentation" OR Quantif* OR Algorithm* OR "Estimation" OR Automat* OR Semi-automat* OR Semiauto* Search: #3 AND #4 J.M.J. Waymont et al. we removed 354 results because they were duplicates or were not in English.During title and abstract screening, we screened titles and abstracts of 2624 records, and excluded 2427 of them.We sought retrieval of the remaining 197.Among these, five were not accessible either publicly or via our institutional library, five were conference abstracts, and two considered PVS quantification for histology samples.Therefore, 185 studies were ultimately assessed for eligibility.From them, 73 studies were ineligible and 112 studies that met our criteria were included in this review.A flow chart of the identification and screening process is provided in Fig. 1.

Overview of included studies
Of the 112 included studies, 45 fitted under 'method development', eight under 'improvement', and 59 under 'application'.Four separate PVS quantification methods participated in the Vascular Lesions Detection Challenge 2021, as documented in Sudre et al. (2022).We treat them as four distinct records, thus increasing the count of 'method development' studies to 48, and the total number of studies to 115.Overviews of all included studies within these three categories are presented in their respective sections (method development: 3.3, Table 7; improvement: 3.4, Table 9; application: 3.5, Table 10).

Publication timeline
The earliest record identified was published in 2002 (Kruggel et al., 2002).Method development publications became more prevalent from around 2016 onwards reaching a peak in 2019 (16.67% from all publications in this category), and publications applying computational PVS Fig. 1.Flow chart of identification, screening, and selection process.
J.M.J. Waymont et al. quantification increased from around 2020 (Fig. 2).Supplementary Table 1 provides a breakdown of publications by type and year.

Populations and sample sizes
Clinical (i.e., patient) and non-clinical (e.g., healthy volunteers, community cohorts) samples have been used in the development, improvement, and application of computational PVS quantification methods, with non-clinical samples (e.g., community-dwelling cohorts) forming the largest proportion by a small margin in 'method developments' and 'improvement' groups.Four sources describing methods (Uchiyama et al., 2008(Uchiyama et al., , 2009;;Zhang et al., 2016Zhang et al., , 2017) ) did not specify the type of population.Table 2 shows population type overall and by method/improvement/application categories.From the 51 studies that included a clinical sample, 23 included a control population.From the clinical samples, patients with cerebrovascular or neurodegenerative diseases were more represented than patients with psychiatric conditions, communicable diseases, or genetic diseases.
Age groups of populations ranged from children and adolescents through to older adults.Studies comprising only older adults (60+ years) and population studies including adults from 18 years old onwards prevailed.Ten studies (predominantly method development studies) did not provide information about participants' age.Table 3 shows population age groupings used in all and each of the study categories.
Samples sizes ranged from one to 39,976 participants.Two studies that used in-silico and/or physical phantoms (Duarte Coello et al., 2023, 2024) did not involve the use of human data.The median sample size across all study types was 106 participants (IQR = 456.5).Application studies had the largest median sample size (161 participants) but also the greatest variance (IQR = 482.5).Sample sizes distribution by groups are shown in Table 4.

MRI sequences and field strengths
The included studies used a variety of MRI sequences, including T1w, T2w, proton density (PD), fluid-attenuated inversion recovery (FLAIR), and T2*w.Across all study types, the combined use of T1w and T2w MRI was the most frequently used (27.83% of studies), followed by the use of T2w imaging only (20.87% of all studies).One study used synthesised T2-type images for the development of a Digital Reference Object to evaluate PVS enhancement methods (Bernal et al., 2022).Although many studies of all types reported a combination of two or more sequence types, the majority used a single sequence, either T1w or T2w, to identify the PVS.Frequencies of individual and combined sequences are presented in Table 5.
The most commonly used MR field strengths were, in order, 3T, 1.5T, and 7T.Method development studies used 1.5T imaging more often than 3T or 7T (50.00%, 33.33%, 22.92%, respectively), with seven studies using MRI acquired at two different magnetic fields and one study (Dubost et al., 2019c) using MRI acquired at three different magnetic fields: 1T, 1.5T and 3T.In general, 17 studies used imaging from more than one field strength, typically a combination of 1.5T and 3T images or of 3T and 7T.Five 'method development' studies and one 'improvement' study (Paz et al., 2009) did not report the magnetic field of the MRI used.Application studies more commonly used 3T imaging (71.19%).One application study (Jokinen et al., 2020) used a small number of images from a 0.5T scanner.Frequencies of field strengths used in the included studies are presented in Table 6.

Voxel sizes and slice thickness
Voxel sizes varied amongst the samples.Fourteen out of the 48 methods proposed were 2D, owed to the anisotropy of the voxels in the images of the samples used.Consequently, application studies using those methods comprised images with largely anisotropic voxels (e.g., 0.4 × 0.4 × 5 or 6 mm 3 ) (Tables 7 and 10).But not all 3D methods studies used images with isotropic voxels, as in many cases pipelines involved resampling and interpolation to convert the images to 1 mm isotropic (Table 7).

Method development studies
Summaries of all method development studies (N = 48) are presented in Table 7, ordered by year of publication.Twenty-seven were journal articles, 16 published in conference proceedings, and five in preprint repositories.It is worth noting that the four methods showcased by Sudre et al. (2022) submitted as part of the challenge organised by the publication authors were, at the time this review was conducted, in a preprint repository.Despite the semi-automatic method proposed by Smith et al. (2020) being also at a preprint repository, this method per-se has already been applied by a clinical study (Langan et al., 2022).J.M.J. Waymont et al.

Computational methods
Seven of the 48 'method development' studies used a semiautomated approach, while the remaining 41 used a fully automated one.Broadly speaking, computational methods included filtering, machine learning, deep learning, combined approaches and intensity thresholding (i.e., as main technique) (Table 7).It is worth noting that most, if not all papers, involve thresholding.CNNs for example, often output a response map that is thresholded to get the binary masks.Also, filtering techniques generate a response that is thresholded to obtain the PVS proxies.The most frequent approach was the use of deep learning,             .When applied to a sample of n=1782, agreement with a semi-quantitative visual rating done by an independent expert rater for DWM R 2 (thr0.1)= 0.38, R 2 (thr0.5)= 0.45, R 2 (thr0.9)= 0.47, p < 0.001, and BG R 2 (thr0.1)= 0.02, R 2 (thr0.5)= 0.05, R 2 (thr0.9)= 0.04, p < 0.001.(Yang et al., 2021) PVS rating using CNN-based deep learning model using 3 slices of the T2w (basal ganglia only).
Step  proposed in 16 studies (33.33%).Frequencies of all types of approaches proposed are presented in Table 8.From all the sources in this category, only nine provide their code in a public repository, 28 do not provide the code, and one provides via to request the code (Supplementary Table 2).The rest use publicly available resources like the Frangi vesselness filter (Frangi et al., 1998) implementation by Dirk-Jan Kroon (2009) hosted by MATLAB Central File Exchange (https://www.mathworks.com/matlabcentral/fileexchange/24409-hessian-based-frangi-vesselness-filter), or the 3D multi-radii optimally oriented flux responses for curvilinear structure analysis implementation by Max W.K. Law (2013) also hosted by MATLAB Central File Exchange (https://www.mathworks.com/matlabcentral/fileexchange/41612-optimally-oriented-flux-oof-for-3d-cu rvilinear-structure).The FMRIB Software Library (FSL, https://fsl.fmrib.ox.ac.uk/fsl/fslwiki) is the preferred software used for sequence co-registration, occasionally for brain extraction, and in two sources (González-Castro et al., 2016b, 2017) it is also used for generating priors of the basal ganglia region.AFNI (https://afni.nimh.nih.gov/) is used by two sources (Boespflug et al., 2018;Williamson et al., 2022) for co-registration and skull-stripping, and elastix (https://elastix.lumc.nl/) is used for co-registration by one (Spijkerman et al., 2022).Not all methods correct for b1 inhomogeneities in the magnetic field, compensate for the presence of noise, or refer to a software/algorithm for normalising the intensities of the images prior to segmentation.The software most commonly used for generating the regions of interest is Freesurfer (https://surfer.nmr.mgh.harvard.edu/)(See Supplementary Table 2).

Model training
Thirty-two (66.68%) of the 48 quantification methods developed required training.Training and validation approaches included 'leave one out' (5 studies, 10.42%), 'k-fold cross-validation' (7 studies, 14.58%) and variations involving training data in one subset of data and testing in another (14 studies, 29.17%), training in one subset, validating in another, and testing in a further subset (4 studies, 8.33%), and a progressive/iterative training including altering thresholds and increasing sample sizes (2 studies, 4.17%).

Reference standard
The reference standard, or 'ground truth', most frequently used in PVS quantification method development was visual rating scores (13 studies, 27.08%).Other visual reference standards included visual count of PVS (5 studies, 10.42%) and 'visual inspection' (often lacking further detail of what this entailed, but where mentioned included differentiating PVS from lacunes; 5 studies, 10.42%).Manual segmentation was used in 11 (22.92%)studies, and manual annotation in eight (16.67%).
Three studies (6.25%) did not describe a ground truth or reference standard, and a further 3 studies (6.25%) used more idiosyncratic reference standards (e.g., heuristically enhanced PVS, and probabilistic boxes for previously manually segmented PVS).

Populations and sample sizes
A brief overview of populations and sample sizes across all study types is provided in Section 3.2.2,Tables 2-4.The most frequent age group in which computational PVS quantification methods have been developed was in mid-and older adults (participants aged 40 years and over; 11 studies, 22.92%), followed by general adult (18 years and over; 10 studies, 20.83%), and older adult (60 years and over; 10 studies, 20.83%).Non-clinical populations were more widely represented than clinical (27 studies, 56.25% non-clinical, 17 studies, 35.42% clinical).Of the method development studies that used clinical populations, medical conditions within these populations included hereditary conditions (one study involving hereditary cerebral amyloid angiopathy carriers; 2.08%), six studies involving stroke patients (i.e., cerebrovascular; 12.5%), nine studies with samples with neurological or neurodegenerative diseases (e.g., Alzheimer's disease, multiple sclerosis; 18.75%), and one study involving patients with major depressive disorder; 2.08%).However, there is considerable overlap between the samples used by the different studies.2019) use more than 1000 scans from the Rotterdam Scan Study.A manually annotated subset of it is used in the four studies described by Sudre et al. (2022).Ballerini et al. (2018) and Ramirez et al. (2011Ramirez et al. ( , 2015) ) use data from the Sunnybrook Dementia Study.Ballerini et al. (2018), González-Castro et al. (2016b, 2016a, 2017) and Wang et al. (2016b) all use scans from the Edinburgh Mild Stroke Studies.Other common data sources used are the Human Connectome Project (Choi et al., 2020;Lan et al., 2023;Sepehrband et al., 2019), the Lothian Birth Cohort 1936 (Ballerini et al., 2016;Wang et al., 2016b), the Multi-Ethnic Study of Atherosclerosis (MESA) cohort (Rashid et al., 2023) and the Southall And BRent REvisited (SABRE) ( (Sudre et al., 2022), 4 participating studies).All these are well-known clinical and population studies with data available either by request to the data holders or through public databases, which would facilitate comparability in the results.However, it is unclear if PVS reference segmentations could be made available, as they are not included in any of the primary data sources mentioned.Studies using small or not publicly available samples (22 studies), or which did not specify the source of data used (4 studies) made up 54.17% of the total number of studies.

Assessing computational PVS quantification
A number of different approaches to assessing the quality of PVS quantification were used.This, together with the heterogeneity in image acquisitions and settings, make direct comparisons difficult.More commonly used assessments include measures of associations between computational output and reference standard (e.g., correlations), measures of accuracy (e.g., false positive/false negatives, ROC analyses), and measures of spatial agreement.Eighteen studies calculated association (correlation, regression) with reference standard measurements or scores (Table 7).We used the best performance reported amongst the myriad of evaluations carried out by each study, to discuss the overall performance of the different methods by category (i.e., semi-automatic vs. automatic, CNNs vs. conventional filtering, etc.) and estimate the overall magnitude of these correlations (Fig. 4).Not surprisingly, semiautomated methods (Ramirez et al., 2011;Wuerfel et al., 2008) achieved the highest correlation between observer-derived measurements.The approach presented by Choi et al. (2020), which uses a CNN to remove false positives from segmentation priors derived from thresholding the output from the Frangi filter, also reported very high intra-class correlation coefficient, but this was only validated in 10 scans.The  2018), reported the highest variability in the method's performance.Overall, correlations were on average β = 0.881, 95% CI = [0.8690.892] (see diamond in pooled analysis of Fig. 4).Accuracy metrics varied, largely depending on the reference standard, but all methods reporting accuracy results used various metrics (Table 7 and Supplementary Table 3).Average Dice similarity coefficient (DSC) values from sources that report them are illustrated in Fig. 5.The majority are between 0.6 and 0.8.Only (Ranti et al., 2022) stands out with a value of 0.9914 (Table 7, Fig. 5).

Improvement studies
Summaries of all improvement studies identified, which fitted our inclusion criteria (N = 8), are presented in Table 9. Targets for quantification improvement included image compression, motion artefacts and noise in general, image texture, comparison of vessel likelihood filters, and any source from which to derive useful recommendations for improving any of the methods developed up to date.Three sources are chapters in conference proceeding books (Bernal et al., 2020(Bernal et al., , 2021;;Duarte Coello et al., 2023).The rest are journal publications.

Computational approach
Five out of the eight studies in this category utilised Frangi filtering and thresholding approaches (with three also assessing RORPO and/or Jerman filters).One study used a Deep Learning (CNN) approach (Zong et al., 2021), and two used mathematical observer models (Duarte Coello et al., 2024;Paz et al., 2009).

MR sequences and field strengths
All studies aimed at improving the detectability and segmentation accuracy of PVS used T2w imaging, or synthesised T2w-like images in one case, as this is the preferred sequence to identify PVS as per STRIVE recommendations (Wardlaw et al., 2013).Three studies used 3T images, two used 7T images, and one study did not specify field strength.The study that used T1w images (Valdés Hernández et al., 2024a) found inconsistencies in the results compared with those from T2w, and suggested these might reflect presence of vessel mineralisation, hypointense, as PVS, in T1w MRI sequence.

Populations and sample sizes
Populations used in improvement studies included clinical groups (multiple sclerosis, mild stroke), and healthy adults.Three studies used synthetic data.Sample sizes ranged from N = 10 to N = 228 participants, although one study (Bernal et al., 2022) used intensity values from T2w

Improvement study results
Each of the eight improvement studies had different targets (Table 9).One study examined image compression (Paz et al., 2009), finding that JPEG 2000 compression does not jeopardise PVS detectability when bit rates are greater than 0.125bpp.One study proposes a framework for PVS quantification by analysing image texture features and applying a total variation filter if necessary, to improve PVS quantification in noisy versus less noisy or clean imaging (AUC = 88.8%)(Bernal et al., 2020).Three studies examined the impact of motion artefacts on PVS quantification, finding that correcting motion artefacts improved PVS quantification when using Frangi filtering plus thresholding (Bernal et al., 2021) and when using a convolutional neural network approach (Zong et al., 2021), or that simply using higher thresholds after filtering for noisy images compared with the thresholds used for clean images will be enough (Valdés Hernández et al., 2024a).Comparison of the performance of Frangi, RORPO and Jerman filters for highlighting PVS candidates was the subject of three studies.One study compared the three filtering approaches and found that RORPO performed best in isotropic imaging, but that all filter types were unable to distinguish between PVS and other hyperintense features (Bernal et al., 2022).Another compared Frangi with Jerman and found more incomplete and fragmented detection of long and thin PVS with Jerman than with Frangi (Valdés Hernández et al., 2024a).Another compared the performance of RORPO and Frangi in phantoms to conclude that RORPO, in addition of having less adjustment requirements, detected smaller cylinders in their entirety more consistently while Frangi seemed to be best suited for voxel sizes equal or larger than 0.4 mm-isotropic and cylinders larger than 1 mm diameter and 2 mm length (Duarte Coello et al., 2024).One study diverges from the rest in terms of aims (Duarte Coello et al., 2023), as its focus is the improvement in the accuracy of the quantification of the PVS morphometrics, namely length and width, and not in the detection or segmentation of PVS per se.PVS morphometrics are traditionally assessed by measuring the maximum and minimum diameters of the ellipsoid that encircles each PVS (Ballerini et al., 2020b).The traditional approach, implemented in Matlab by the function regionprops3, is accurate if the PVS are linear and straight, but not if they are curved or elongated with irregular shapes.Three sources have linked data/code repositories: Bernal et al. (2022), Duarte Coello et al. (2024), and Valdés Hernández et al. (2024a) to enable compatibility in follow-up improvements, reproducibility, and fair comparability of these results with those from further analyses.

Application studies
Summaries of all included application studies (N = 59) are presented in Table 10.Methodological and clinical differences between the studies ruled out meta-analysis of the estimates obtained.

Computational PVS quantification methods used
Many of the computational methods used in the application studies have been described in Section 3.3, and have been applied directly or used as the basis of PVS quantification approaches in application studies.The most frequently applied method was thresholding the response of the Frangi filter, as described by Ballerini et al. (2016Ballerini et al. ( , 2018) ) (12 studies) or in the frameworks proposed by Sepehrband et al. (2019) (10 studies), Smith et al. (2020) (1 study), or Ranti et al. (2022) (1 study), which in total make up for 54.24% of the application studies.Ten studies use CNN configurations, mainly U-Net and U-ResNet (16.95%), to assess PVS burden, and four studies use this approach to post-process the output from the Frangi filter.Morphological detection as proposed by Schwartz et al. (2019), followed by thresholding is used in six studies, while a study uses the morphological detection approach described by Sato et al. (1998).
Computational PVS quantification methods applied but not described in Section 3.3.include approaches based on Guerrero et al. ( 2018) method (a deep learning/CNN approach), Yushkevich et al. (2006) method (a semi-automated, 6-connected voxel active contours approach), and seven in-house methods (including thresholding and filtering methods, and approaches based on Dubost et al. (2019b) and Sepehrband et al. (2019) methods).One study refers using the 2D U-Net approach for multisite learning as developed by Remedios et al. (2020) for segmenting haemorrhages in CT scans.These were not among 'method development' papers included in this review as they did not independently meet inclusion criteria (e.g., the quantification approach may not have been designed explicitly for PVS detection (Guerrero et al., 2018;Remedios et al., 2020), or may have relied heavily on a manual component (Yushkevich et al., 2006)).A list of the computational PVS quantification approaches and the frequency of their use in the application studies identified in this review is presented in Table 11.

MR sequences and field strengths
Frequencies of MR sequences and field strengths used in 'application' studies in comparison with other study types are outlined in Section 3.2.3,Tables 5 and 6.The most common combinations of MR sequences used in these studies were T1w and T2w (19 studies).This was also the combination of sequences used by Choi et al. (2020), which can be considered both: a 'method development' and an 'application' study.T1w alone, and the combination of T1w, T2w, T2*w and FLAIR (10 studies each) were also commonly used.This combination, however, was not used specifically for identifying PVS, but was used by full pipelines to extract regions of interest and obtain other imaging markers that were analysed by the studies in relation to PVS.
Application studies most frequently used 3T imaging (42 studies, 71.19%), although a large proportion used 1.5T imaging (20 studies, 33.9%) and only five studies used 7T (8.47%).This differs from method development studies, where distribution of field strengths was fairly evenly distributed, with a slight bias towards 1.5T imaging.

Population types and sample sizes
Application studies were almost evenly split between those using a clinical population (50.85%) and those using only a non-clinical population (49.15%).Participants were most commonly independently living older adults (>60 years; 18 studies, 30.51%), followed by patients with cerebrovascular disease-related phenotype (e.g., stroke patients, hypertensive patients, patients with artherosclerosis) (9 studies, 15.25%).In terms of population characteristics, application and method development studies shared similar distribution (i.e., proportion).Median sample size of PVS application studies was N = 161 (IQR = 482.5),with two multicentre epidemiological studies with sample sizes greater than    PVS enlarged long-duration spaceflight.WM-PVS were correlated with a reduction in the subarachnoid space at the vertex.WM-PVS increase was correlated with mission duration.BG-PVS were associated to previous spaceflight experience.(Barnes et al., 2022) Topological relationship between PVS and WMH, and potential PVS influence on WMH progression across 3 scanning waves.(Ballerini et al., 2018) T1w, T2w, T2*, FLAIR., 1.5T   progressor" group (n = 7), there was a significant relationship between the change in the CSF total-tau (t-tau) levels and the change in the PVS burden (PVS cluster (ß = 1.46, p = 0.014; PVS volume ß = 14.6, p = 0.032), and between change in NfL levels and change in the PVS burden over time (PVS cluster ß = 0.296, p ≤ 0.001; PVS volume ß = 3.67, p = 0.002).(Perosa et al., 2022) Investigation of histopathological correlates of PVS and relationship with vascular amyloid-B in cerebral amyloid angiopathy (CAA) using deep learning algorithm (3D Unet).
For the random forests model that used 90 PVS features, RMSE in estimating the subjects' age was 9.53 ± 0.09 years, and the correlation between estimated age and actual age was 0.875 ± 0.003.The combined set of PVS burden and location features provided the best estimation performance.(Sibilia et al., 2023) Understand the potential interactions between PVS, cortisol, hypertension, and inflammation in the context of cognitive impairment (Sepehrband et al., 2019), but only using T1w T1w, 3T ADNI-1 population with all biomarker info (N=465, age range=55-90).But final sample is N=456 after removing outliers from PVS segmentation In the centrum semiovale, higher levels of inflammation reduced cortisol associations with PVS volume fraction.PVS were negatively associated with ACE only when interacting with TNFr2 (a transmembrane receptor of TNF).BG PVS were positively associated with TRAIL (a TNF receptor inducing apoptosis).(Shih et al., 2023) Analyze the relationship between sleep and the PVS in cognitively healthy adults across the aging continuum.(Sepehrband et al., 2019) T1w, T2w, 3T N=725 cognitively healthy participants (36-100 years old) from the HCP-Aging Lifespan Release 2.0.
Older adults who had better sleep quality and sleep efficiency had larger BG-PVS volume fraction.However, sleep measures were not associated with CSO-PVS volume fraction.BMI influenced the BG-PVS across middle-aged and older participants.The effect of sleep quality on PVS volume fraction was mediated by BMI.There were significant differences in PVS volume fraction across racial/ ethnic groups.(Sotgiu et al., 2023) Explore the associations between brain PVS and autism: clinical and pathogenic implications (Sepehrband et al., 2019)  The relative area ratios of PVS in the bilateral frontal cortex and the BG were significantly higher in the OSA group, and even more in the severe OSA group than in the mild-(continued on next page) J.M.J. Waymont et al. 30,000 (Duperron et al., 2023;Evans et al., 2023) and five with more than 1,000.Broad population type and age-groupings used in application studies compared to method development and improvement studies are presented in Tables 4 and 5, respectively.

Application results
Six areas of interest emerged in common across a number of application studies: age, sex, hypertension, diabetes, sleep, white matter hyperintensities (WMH), and cognition.
3.5.5.1.Age.Eleven studies reported a positive association between computational PVS measures and age (Aribisala et al., 2023;Barisano et al., 2021;Choi et al., 2020;Evans et al., 2023;Huang et al., 2021b;Kern et al., 2023;Park et al., 2023;Piantino et al., 2021;Ranti et al., 2021;Wang et al., 2016a;Zong et al., 2020), one of which was a study in young adults (mean age = 28.8years).But studies on life-time trajectories point at varying patterns of these associations throughout the lifespan (Kim et al., 2023a;Lynch et al., 2023).One study of adolescents (N=118) found no association between age and PVS burden (Piantino et al., 2020), and neither another study on children (Sotgiu et al., 2023).A study on chronic traumatic brain injury did not find PVS burden being associated with brain age (Hicks et al., 2023).Studies that analysed the PVS association with age in different regions had conflicting results, with the majority reporting a positive association/correlation with age in all regions, and one study finding a negative association between age and PVS volume in the temporal region ((Charisis et al., 2023), N=1026).

Table 11
Frequency of computational PVS applications in included studies.
Only one study involving cognitively normal and mild cognitive impaired individuals found higher CSO PVS fraction in females compared with males (Sepehrband et al., 2021) after adjusting for age and years of education.However, the same study reported CSO PVS fraction being higher in males in the cognitively normal group and no sex differences in CSO PVS fraction in the mild cognitively impaired group in absence of potential confounds.
3.5.5.3.Hypertension.Eleven out of thirteen studies identified a positive association between hypertension and PVS burden (Ballerini et al., 2020b;Charisis et al., 2023;Cheng et al., 2022;Duperron et al., 2023;Evans et al., 2023;Haddad et al., 2022;Huang et al., 2021b;Tachibana et al., 2024;Wang et al., 2021;Wang et al., 2016a;Zebarth et al., 2023).Duperron et al. (2023), the largest study amongst the ones included, found causal associations of high blood pressure with BG and hippocampal PVS, and of BG PVS and hippocampal PVS with stroke, accounting for blood pressure, even after Mendelian randomisation.In Wang et al. (2021), basal ganglia PVS count was associated with hypertension, but not PVS volume.Piantino et al. (2021) found no association between systolic or diastolic blood pressure and PVS burden, and Choi et al. (2020) did not find blood pressure to be associated with PVS volumes..5.5.4. Diabetes.No association between PVS burden and diabetes was found in two studies (Ballerini et al., 2020b;Wang et al., 2016a), but a positive association was identified in a third study between white matter PVS burden and diabetes (Wang et al., 2021), and in another study exploring specifically the putative role of PVS burden in type 2 diabetes mellitus (T2DM) (Zebarth et al., 2023).This study also found that indirect effects of T2DM on volumes of other SVD markers (i.e., WMH and lacunes) were mediated by the PVS burden in the white matter.

3
3.5.5.5.Sleep.The relationship between sleep and PVS burden was explored in nine studies (Aribisala et al., 2023;Berezuk et al., 2015;Hicks et al., 2023;Lysen et al., 2022;Barisano et al., 2021;Piantino et al., 2021;Shih et al., 2023;Sotgiu et al., 2023;Wang et al., 2023), with all except two (Barisano et al., 2021;Hicks et al., 2023) identifying an association between at least one metric of sleep and PVS burden.Berezuk et al. (2015) and Lysen et al. (2022) reported a negative correlation between both sleep efficiency and PVS burden.Berezuk et al. (2015) further reported a negative association between the duration of N3 (i.e., deep sleep) and PVS burden, and a positive association between wakefulness after sleep onset (i.e., number of interruptions to sleep) and BG-PVS burden.But Lysen et al. (2022) found no associations between any other metric of sleep and PVS burden in any location.In a very heterogeneous sample (Shih et al., 2023) found that older adults who had better sleep quality and sleep efficiency had larger BG-PVS volume fraction, while CSO-PVS volume fraction was not associated with sleep measures.But this study also found that the effect of sleep quality on PVS volume fraction was mediated by BMI, and that there were significant differences in PVS volume fraction across racial/ethnic groups.Piantino et al. (2021) reported a significant association between poor sleep and PVS volume, with PVS volume greater in those with poor sleep and mild traumatic brain injury (TBI) than those with good sleep and mild TBI.Aribisala et al. (2023) found that CSO-PVS volume mediated 5% of the associations found between sleep parameters and brain structural changes.Sotgiu et al. (2023) reported that white matter PVS were related with insomnia in autistic children, and Wang et al. (2023) found that PVS volume ratios were associated with obstructive sleep apnoea (OSA), showing higher accuracy for distinguishing groups of patients with different degrees (i.e., severity) of OSA.
3.5.5.6.White matter hyperintensities.Nine studies reported positive associations between PVS burden and WMH volumes (Aribisala et al., 2023;Ballerini et al., 2020b;Barnes et al., 2022;Evans et al., 2023;Huang et al., 2021a;Jian et al., 2023;Karvelas et al., 2023;Kern et al., 2023), with the latter finding the association only for PVS in the BG and not in the CSO.Two of them found also topological relationships between the occurrence of PVS and the location of WMH deep clusters (Barnes et al., 2022;Huang et al., 2021a).A study in chronic TBI (Hicks et al., 2023) reported a positive association between PVS burden, given as prevalent ratio rate, and bilateral lesions.However, it is unclear if the lesions referred are only as a consequence of the injuries or including also WMH.
3.5.5.7.Cognition.Fifteen studies explored the relationship between cognition and PVS with results varying depending on the characteristics of the cohorts and the cognitive domains assessed.Hicks et al. (2023), for example, reported PVS burden impacting verbal memory but not any other cognitive domain.Two studies found no direct or significant association between general cognition (general cognitive ability, Montreal Cognitive Assessment score) and PVS burden (Ramirez et al., 2022;Valdés Hernández et al., 2020).(Valdés Hernández et al., 2019) reported PVS burden as predictive of reduced fluid intelligence in Huntington's disease patients but not in family members without overt disease, while Sotgiu et al. (2023) did not find association between PVS burden and IQ in autistic children In mild cognitive impaired (MCI) and dementia, all studies reported different measures of PVS being more relevant in MCI (Bown et al., 2023;Sepehrband et al., 2021) and frontotemporal dementia (Moses et al., 2022) groups than in controls.Jokinen et al. (2020) reported PVS volume to be a significant predictor of processing speed, decline in processing speed, decline in memory, and of total vascular dementia assessment (VADAS-cog) score.PVS burden was related to poor cognition in CADASIL (Karvelas et al., 2023), and in Huntington's disease (Li et al., 2023;Valdés Hernández et al., 2019).Kamagata et al. (2022) reported higher PVS volume fraction (total, in basal ganglia, and white matter) in Alzheimer's disease (AD) patients compared to controls, and higher white matter PVS volume fraction in the mild cognitive impaired (MCI) group compared to controls.Hamilton et al. (2021a), Hamilton et al. (2021b) reported that worse total SVD burden (PVS-inclusive) was associated with cognitive decline, but did not report PVS-specific results.

Risk of bias assessments
Risk of bias assessments were conducted based on the QUADAS-2 framework, with potential sources of bias assessed including: patient selection, index method bias (i.e., does the method used to quantify PVS introduce bias), reference accuracy (i.e., accuracy of ground truth), reference blind to index (i.e., was PVS quantification influenced by prior knowledge of ground truth), same reference for all, inclusion of all participants, method/application matches review question, applicability, and overall assessment.Applicability refers to the plausibility of the methods or study analyses of being applied widely in different scenarios (i.e., imaging protocols, presence of noise, field strengths, scanner types and configurations, manufacturers, and datasets of patients with different clinical characteristics), as well as to the completeness in the description of the method or study design, which allows it to be reproducible, and, therefore applicable more widely (i.e., generalizable).

Method and improvement
Due to the small number of 'improvement' studies, these were grouped with 'method development' studies in the presentation of risk of bias assessments.In general, 42.31% of method and improvement studies were rated as low risk of bias, 17.31% were rated as instilling J.M.J. Waymont et al. some bias concerns, and 40.38% had high risk of bias.One study did not provide enough detail for a bias assessment.The individual domain with the most frequent high risk of bias scoring was 'applicability' (how well the index measure (e.g., method) can be applied to PVS quantification more widely (i.e., in other scenarios, datasets) in the context of the review question).Reasons provided when rating applicability as having a high risk of bias included a lack of detail in the methodology, lack of validation testing, and no ability to discriminate between PVS and other lesion types.A summary risk of bias plot for method and improvement studies is shown in Fig. 6.

Application
There was low risk of bias in 59.32% of application studies, some risk in 25.42%, and high risk in 15,25%.The domain most frequently scoring as having a high risk of bias was 'includes all participants', rated high risk in 49.15% of application studies.Often, a small number of patients were excluded from application studies due to poor image quality (noise and motion artefacts) which compromised PVS quantification performance, without a statistical analysis of the relevance of missing values as a consequence.A summary risk of bias plot for application studies is provided in Fig. 7.

Overview
We conducted a systematic review of the development, improvement, and application of computational PVS quantification approaches, including studies published up to September 2023.We explored 67 approaches involving 48 methods presented as part of 45 publications, eight frameworks or suggestions for improving some of these methods, nine in-house pipelines or methodologies using different elements or combinations from some of the computational approaches presented, and two re-purposed implementations from approaches developed for other purposes, these applied to clinical research.We collated information on the main aims and outcomes of each study, and extracted and analysed data relating to MR imaging sequences and field strengths used, population types and sample sizes, method characteristics including training and validation, and results.
The number of studies developing or using computational PVS quantification approaches has increased steadily since the first identified study of this type in 2002 Kruggel et al. (2002) reaching a peak of 8 computational PVS assessment methods published in 2019, the year from which the number of studies applying these methods has been experiencing an exponential growth.Currently, research into PVS using quantification approaches comprises a large field, favoured by the increase in the ability to analyse large datasets for which visual scores are lacking, and the increased interest in the function and relevance of PVS in brain health.
The use of a morphological filter to enhance PVS-like structures, with Frangi being the most commonly used, and the use of a U-Net configuration with or without residual connexions, were the two most widely applied and promising approaches, which have been independently validated.The commonest PVS feature (s) that are measured are volume and count, obtained from the PVS segmentations.From them, PVS volume, absolute, or as a percentage in the region of interest, appears to be the most consistent PVS parameter used by the clinical studies that applied a computational approach to assess PVS.Individual PVS morphology (i.e., length, width, tortuosity, sphericity) are generally calculated using ellipsoid, tubular or skeletonisation approaches, but drawbacks of these conventional methods have been recently exposed and challenged by the use of Bèzier curves (Duarte-Coello et al., 2023).
Imaging data from older adults, or a wide age range of adults, has most frequently been used in both the development and application of computational PVS quantification approaches, consistent with studies using non-computational approaches to PVS quantification (Francis et al., 2019).MR imaging was most often obtained from 1.5T and 3T imaging, with a recent increase in the use of 7T imaging.T2w imaging was most commonly used across all study types, often in combination with T1w and FLAIR imaging, these mainly being used as part of the frameworks developed for eliminating confounds and generating regions of interest, although a number of application studies used T1w only for assessing the PVS.It has been noted, however, that vessel calcifications, also hypointense in T1w images, can confound the accuracy of PVS identification, and, therefore, segmentation, if only T1w images are used (Valdés Hernández et al., 2024), falsely inflating the number of PVS detected.We limited the scope of the present review to static, structural imaging of PVS, as opposed to including fluid dynamics, functional imaging, and/or cerebrovascular reactivity studies which may make greater use of alternative imaging modalities, including diffusion tensor imaging, 4D-flow, positron emission tomography (PET), single-photon emission computed tomography (SPECT).The contribution of each MR sequence contrast and their combinations to the detection of PVS is still unclear and deserves further study.Meantime, application studies using PVS quantification methods should be aware of the differences in MRI sequences' sensitivities and their combinations, when drawing conclusions on the effect of PVS burden in relation to clinical and demographic variables.
Special attention should be given at the differences in the PVS detectability at different magnetic field strengths (i.e., 1.5T vs. 3T vs. 7T).This, together with the spatial resolution in terms of voxel sizes, greatly influence the PVS quantification results.Assuming a perfectly calibrated and error-free quantification method, images obtained at 7T are expected to show more PVS than those obtained at 1.5T without this necessarily meaning that the brains scanned at these two different magnetic field strengths have different burden of PVS.Equally, quantification methods can be susceptible to partial volume effects due to different voxel sizes (Duarte-Coello et al., 2024).Therefore, it will be advisable for application studies using PVS quantification methods and images acquired from multiple scanners and imaging protocols, to Fig. 6.Risk-of-bias summary plot for method development and improvement studies.divide the clinical sample by MRI protocol/magnetic field strength, separately tune the preferred PVS quantification method to the specific image subset, perform the analyses separately, and, finally, meta-analyse the results.

Method development studies
The most common approaches to computational PVS quantification were thresholding, thresholding and filtering, and deep learning (CNN) methods.Deep learning approaches have become almost ubiquitous since 2017, with 19 of the 48 method development studies using CNN methodology.They have higher processing speed and capacity than the more conventional methods, and a single architecture can be designed and trained to separately assess different confounding features, e.g., lacunes, PVS, and WMH.But the large number of parameters that CNN architectures manage and need to adjust require a breath of size and variability in the training dataset impossible to achieve at present.Then, as the resultant model is overfitted to the imaging parameters and clinical presentations of the dataset used for training, these models would need to be re-trained for every different cohort at hand.This entitles a reasonable number of ground truth manual segmentations, not always feasible and highly observer-dependant.While some of the architectures reviewed here have been made publicly available, we could not find any pre-trained CNN model to independently test the accuracy levels reported in their original publications.The conventional approaches that threshold the output from a morphological filter that enhances PVS-like structures, on the other hand, do not require large amount of data to be tuned, but still the threshold and some parameters would need to be adjusted depending on the imaging protocol and image quality.Also, separate developments would be required to remove confounding pathology and deal with artefacts.In terms of reproducibility, sustainability, and applicability, both types of approaches, being fully automatic, could reach high levels, but these would depend, for now, on the skills of the computer programmers that generate and manipulate them, and data availability.
Computational segmentation pipelines used different pre-processing steps.General steps involved co-registration of the different MRI sequences to the one in which PVS are assessed, and intensity normalisation.No justification of the intensity normalisation approach has been given or evaluated in any of the studies.A study classed as improvement (Valdés Hernández et al., 2024a) only evaluates the impact of the point, within the pipeline, in which a linear intensity normalisation is done.Evaluation of the impact of the type of intensity normalisation in the results is currently lacking.Few methods perform bias field correction.But the similarity indices and performance metric of all the methods presented is similar, raising questions about the impact bias field correction may have in the overall performance of the method.Few methods involve denoising too, which, contrary to the correction of inhomogeneities in the b1 magnetic field aimed at filtering out the low-frequencies, is designed to remove the high-frequencies in the image.But given the nature of the task, the application of a band-pass filter in the preprocessing stage may not be the best option to consider.Another preprocessing step worth mentioning is the generation of regions of interest (ROIs), mostly carried out using freesurfer.In some other cases, ROIs are the grey and white matter tissue classes.Given the importance of this step, its validation in different scenarios would have been required.For example, in the presence of narrow ventricles with disconnected horns resembling tubular structures, and in the presence of calcified vessels resembling tubular structures that are also hypointense in T1w images, to mention just two of the most common scenarios (Valdés Hernández et al., 2024a).
Computational PVS quantification approaches were validated against a range of reference standards; most often visual rating scores (27.08% of studies), although there was a relatively high proportion (22.92%) of them which used manual segmentations as a reference standard.While visual rating scores may be less time consuming and applicable to much larger sample sizes than producing manual segmentation data, high correlation between segmentation results and visual scores does not necessarily mean that the segmentations are good.González-Castro et al. (2017) showed that neuroradiological assessments can be subconsciously influenced by the degree of the overall disease present.Therefore, it is likely that, for example, visual rating scores of PVS in the basal ganglia in patients with SVD would be highly correlated with PVS segmentations, but perhaps not so in the basal ganglia of obstructive sleep apnoea patients that would not necessarily have any other confounding imaging feature in their brain scans.Further exploration of the correlation between different computational parameters with visual scores in cohorts with different characteristics is needed.Manual segmentations, on the other hand, enable comparison of more PVS metrics (e.g., volume, width, length) than visual-only assessments, but they have been only available for a limited number of scans or for one slice per region per scan, and most of the studies lack reports of inter-annotator reliability assessment of the manual annotations used.
Not all of the method development studies provided information on the performance or validation of their method, with three studies not using any reference standard of ground truth.Performance metrics used in those that did report results varied, although more common measures included correlations between computational and non-computationally derived PVS values, number of false positives/false negatives, and spatial similarity (Dice coefficient).It is important to note that method development studies reporting the strongest validation results between computational and manual/visual PVS quantification often had either small sample sizes or a significant user-intervention aspect to their methodology in their validation datasets.For example, Wuerfel et al. (2008) reported an intraclass correlation coefficient of 1 (a perfect score), but data from only 4 participants were used, and Ranti et al. (2022) report a DSC score of 0.99 (almost perfect spatial overlap) but their method includes a final 'manual correction' stage.It is also worth noting that differences in MRI sequences, field strengths and voxel sizes can directly impact the performance of the algorithms and, therefore, the common metrics used to evaluate them (e.g., Dice coefficient) if images acquired at various field strengths are jointly used.
A large proportion of method development studies lacked sufficient information with regards to parameters to be adjusted in methods using filtering and thresholding approaches, differentiation between PVS and other lesions (notably lacunes, which can have similar appearance), or applicability (and adjustments required) when using multi-site or multimodality imaging.Some studies provided little to no information about MR equipment and sequences used including critical factors like voxel size and slice thickness, nor about the population from which imaging data were derived or the process of participant selection.This is reflected in the large number of studies rated as having a high risk of bias for applicability.Deep learning is able to deal with changes in voxel size and slice thickness as long as the model has been trained with such a data.The vanilla implementation of the Frangi filter, for example, is also not able to deal with anisotropic data while the implementation of the Jerman filter in Matlab is.This information, therefore, impacts in the subsequent number of applications using the methods developed.
Given the varied approaches to assessing and reporting performance of novel PVS quantification methods, it may be beneficial to work towards a standard for validation and reporting.For example, a dedicated open-source PVS quantification validation dataset/repository would be worthwhile, with representative PVS and SVD, image sequence, image quality, and multi-site imaging, and phantom data.This would enable validation of novel methods by developers themselves and reproducibility assessments from outside parties.It is worth noting that although most method developments have used data acquired at the commonly used field strengths and from relevant populationse.g., older individuals who are therefore likely to have bigger or more visible PVS, some atrophy, and confounding pathologies like WMH -, this does not imply that they would yield successful results if applied without variations, to MRI data from more, less, or simply different disease and age populations, even if acquired in the same scanners.It may also be of benefit to identify minimum reporting standards for validation performance, such as agreement on a preferred reference standard, minimum required validation sample size, and relationship, accuracy, and spatial overlap metrics.

Improvement studies
We identified eight studies concerned with improving computational PVS quantification.Targets for improvement included maintaining PVS detectability with image compression (Paz et al., 2009), reducing noise and motion artefacts (Bernal et al., 2020(Bernal et al., , 2021(Bernal et al., , 2022;;Valdés Hernández et al., 2024a;Zong et al., 2021), comparing filtering approaches when applied to synthetic data (Bernal et al., 2022;Duarte Coello et al., 2024), works on physical and in-silico phantoms for establishing the limits of validity of PVS assessment methods (Duarte Coello et al., 2024), comparing PVS segmentations from T1w vs. T2w (Valdés Hernández et al., 2024a), drawing recommendations for threshold selection (Valdés Hernández et al., 2024a), tuning of filter parameters (Duarte Coello et al., 2024;Valdés Hernández et al., 2024a), and improving the calculation of PVS morphometrics (i.e., width and length) (Duarte Coello et al., 2023).While compression, noise, and motion artefacts appeared to have viable proposed solutions, one study identified remaining issues with differentiation between PVS and other hyperintense (in T2w) features, finding none of the more commonly used filter methods (Frangi, Jerman, RORPO) were effective at differentiating lesion types.This review has identified two ways in which methods have approached this issue: 1) by excluding WMH from the PVS ROIs, and 2) by combining multiple sequences simultaneously (e.g., Boespflug et al. (2018)), identify as PVS the voxels with CSF-like intensities in T1w, T2w, FLAIR and PD, while Sepehrband et al. (2019) segment on the combination of T1w and T2w and look for the intensity profile in FLAIR excluding the voxels on the top 10 percentile of the FLAIR signal distribution).But both approaches have drawbacks.The first one does not account for PVS inside WMH, consequently reporting few PVS in patients with wide and confluent WMH, just because the "clean" ROIs were small.The second one could also lead to erroneous results depending on the sensitivities of the MRI sequences involved, especially in the presence of thick slices and inter-slice gaps.Distinguishing between PVS and other hyperintense lesions remains an important target for improvement in computational PVS quantification.

Application studies
The three most frequently applied computational PVS quantification approaches identified in this review were the one firstly proposed by Ballerini et al. (2016) and implemented as part of different frameworks (24 applications), those derived from the work of Lian et al. (2018) (10 applications), and morphological detection as proposed by Schwartz et al. (2019) (6 applications).Ballerini et al. (2016) approach involved a Frangi filter approach with thresholding optimised based on visual PVS scores.Lian et al. (2018) approach was a deep learning (CNN) method, involving image enhancement and denoising stages prior to deploying a fully-automated CNN, trained on manually segmented PVS data.Schwartz et al. (2019) segmented PVS using a stepwise local homogeneity search of a white matter-masked T1w image, with FLAIR hyperintensity constraints and further linearity, width, and volume constraints.While these three approaches are the most frequently implemented, this is driven by the research groups in which the methods were developed, rather than an indication of a consensus on which PVS quantification approach to use based on methods' technicalities.
Factors identified in common among application studies included age, sex, hypertension, diabetes, WMH, sleep, and cognition, although comparable results were only available for a modest number of studies per factor.Other factors, like ethnicity, are starting to emerge, with two studies (Charisis et al., 2023;Shih et al., 2023) reporting differences in the spatial distribution (Charisis et al., 2023) and volume fraction (Shih et al., 2023) of PVS across racial/ethnic groups.Computational PVS measures used varied, but commonly included PVS count and volume, often segregated by region (e.g., basal ganglia PVS, centrum semiovale/white matter PVS).Increased age, hypertension, WMH, and sleep parameters were broadly associated with computational PVS metrics, although the only one that consistently (i.e., in all the nine studies that explored it) showed an association (in this case positive) with PVS burden was WMH either total volume or visual scores.This was followed by hypertension, with eleven out of twelve studies identifying a positive association between hypertension and PVS burden.The associations between age and PVS burden and hypertension and PVS burden have been previously documented in the literature, and been identified in non-computational studies (Francis et al., 2019).However, this review, different from previous reviews, as includes recent and large population studies involving adults from the whole lifespan or children and adolescents, points at the existence of various patterns of associations and correlations between age and PVS burden, with no association (Piantino et al., 2020;Sotgiu et al., 2023) or even negative associations in some brain regions (Lynch et al., 2023) in early life, a negative correlation until the mid-30s (Kim et al., 2023a), and a positive association/correlation afterwards with different strength degrees (Kim et al., 2023a).Other equally large population studies covering a large lifespan point to some factors that may relate to these findings.For example, (Shih et al., 2023) report that body mass index (BMI) influenced the basal ganglia PVS burden across middle-aged and older participants.A large GWAS study (Duperron et al., 2023) investigating the associations of small vessel disease markers found that 24 genome-wide significant PVS risk loci were associated with white matter PVS only in young adults, suggesting a genetically linked effect of age in PVS burden.In relation to sleep, the eight studies that explored its association with PVS evaluated different parameters: sleep efficiency, interruptions, quality, duration, falling asleep at different times during the day, for just mentioning some.
But all results suggest a positive association between PVS burden and poor sleep health.The dimension of the relationship between sleep and PVS (e.g., in clinical populations with different characteristics) has been less extensively explored, and may be a good candidate for further research.
Associations between sex and PVS burden were mixed, with two studies out of nine finding no association (Kim et al., 2023b;Wang et al., 2016a), one reporting that PVS burden was higher in females (Sepehrband et al., 2021), and six reporting that PVS burden was higher in males.These findings reflect the broader literature, which often has not reported on sex differences, or reports that potential sex differences in PVS burden remain unclear.Similarly, mixed results were identified for associations between PVS burden and diabetes.Of the four studies that referred to diabetes, two found no association (Ballerini et al., 2020c;Wang et al., 2016a) and two found a positive association between diabetes and white matter PVS (Wang et al., 2021;Zebarth et al., 2023).A previous systematic review and meta-analysis reported no associations identified between diabetes and PVS burden (Francis et al., 2019), but studies reporting on diabetes in relation to PVS burden are too few to draw any substantial conclusions as of yet.
PVS burden has been explored in association with cognition in a number of studies.In the present review, we found studies reported mixed evidence for poorer performance on cognitive testing in those with greater PVS burden.Two of the fifteen studies identified that examined cognition relied on a measure of total SVD burden, rather than PVS-specific associations.Both reported negative associations between SVD burden and cognitive decline (particularly decline in processing speed; (Hamilton et al., 2021a;Hamilton et al., 2021b)).From the rest, three studies reported no association between PVS burden and measures of cognition (Ramirez et al., 2022;Sotgiu et al., 2023;Valdés Hernández et al., 2020), and nine indicated a positive association between poor cognition, at least in one domain, and PVS burden.But those indicating a positive association studied specific clinical populations for which cognitive impairment was one of their phenotypes: Huntington's disease (Li et al., 2023;Valdés Hernández et al., 2019), CADASIL (Karvelas et al., 2023), TBI (Hicks et al., 2023), MCI (Bown et al., 2023;Kamagata et al., 2022;Sepehrband et al., 2021), frontotemporal dementia (Moses et al., 2022), and AD (Kamagata et al., 2022).One study involving elderly patients aged 65-84 enrolled in the LADIS study (Pantoni et al., 2004), also reported a positive association between PVS burden and cognition.But all participants in this study had mild cognitive complaints, motor disturbances, minor cerebrovascular events, mood disturbances, and/or other neurological problems (Jokinen et al., 2020).Other systematic reviews and meta-analyses in recent years have also yielded mixed results as to the association between PVS and cognition (Francis et al., 2019;Hilal et al., 2018;Jie et al., 2020), with further research required to shed a light on the role of PVS by stage in cognitive decline, since PVS could increase or decrease depending on compensatory mechanisms, or as part of possibly a neuroprotective or coping mechanism, prior to or amidst continual decline in cognitive functions.

Strengths and limitations
The systematic nature of this review, the broad search terminology and inclusion criteria, and the large number of years covered (i.e., up until September 2023), are perhaps the main strengths of this work.Although there would have been sources missed due to database indexing issues, articles not being published in English language, or access restrictions, the systematic and inclusive nature of our work have allowed us to cover most of the literature on the theme published up to this date, distil, and make available data to inform future research in the field of PVS research.We have summarised and meta-analysed the characteristics, advantages, scope, applicability, and shortcomings of the computational methods developed up to date to assess PVS burden.This work has also covered the works to support the improvement of these methods, establish their limits of validity, and facilitate their use by explaining the effect of tuning their different parameters and drafting recommendations based on the evaluation of the most widely used methods in different settings.But this review covers only findings from computationally-derived measurements.Owed by the availability of databases with imaging and associated neuroradiological assessments and clinical data, still a considerable number of studies use visual scores in their analyses.For illustration, from January to September 2023 alone, our search in Web of Science identified 47 publications of clinical studies with PVS either as outcome measure or as predictor (at least one of them in some cases) that used PVS visual scores, which were excluded from our analyses, against only 23 studies that used computationallyderived metrics and were, therefore, included.Consequently, conclusions cannot be drawn from our findings on factors related to PVS burden as they only reflect results from a percentage of all the studies in this area of research.A wider literature analysis including also data obtained from neuroradiological (i.e., visual) assessments would be necessary to better inform on the associations of PVS with clinical, demographic, genetic, cognitive, and lifestyle factors.

Conclusions and future directions
Computational approaches to PVS quantification have increased in prevalence in recent years.Novel automated PVS quantification approaches often use thresholding and filtering approaches, or deep learning (CNN) approaches, each with different advantages as well as shortcomings.The method to use will, ultimately, depend on resources availability (i.e., ground truth, disposal or not of a pre-trained model, computational capacity, and skills of the researchers/image analysts).Barriers to successful computational PVS quantification, including image compression, image quality, and nature and quality of the reference standard assessments have been identified, and potential solutions have been proposed.In particular, further work is needed to address the issue of lesion type discrimination (e.g., PVS differentiation from lacunes and small WMH in T2w contrast images, and calcified vessels or mineralisation around small arterioles in T1w contrast images).Although some methods were developed using images from different scanners and magnetic field strengths, methods' performance in the same population imaged at different magnetic field strengths are not known.Best practices and adjustments for assessing longitudinal changes in PVS are also not documented.Also, most methods were developed using representative samples with wide range of PVS and confounding disease burden, but results were not analysed taking these into account.Further studies that develop methods for quantifying disease markers should analyse accuracy results by disease load in order to inform better the scope of their applicability.Computational PVS approaches are increasingly being applied in epidemiological and clinical studies, with age, WMH, hypertension, different diseases involving cognitive deficits as part of their phenotypes, and sleep, emerging as factors associated with PVS burden.More research is needed to better understand the mechanisms and mediating factors in the associations that have been consistently found, and the reasons for discrepancies in the factors that have yielded conflicting results across the studies.

Funding
This study is mainly funded by the Hilary and Galen Weston Foundation under the Novel Biomarkers 2019 scheme (ref UB190097) administered by the Weston Brain Institute.This study received also funds from the Fondation Leducq Network of Excellence for the Study of Perivascular Spaces in Small Vessel Disease (16 CVD 05); the Row Fogo Charitable Trust (MCVH, FC, JMW) (BRO-D.FID3668413); Dementias Platform UK 2, which receives funds from the UK Medical Research Council (MR/T033371/1); and the UK Dementia Research Institute at the University of Edinburgh (award number UK DRI-4002) through UK DRI Ltd, principally funded by the UK Medical Research Council, and additional funding partner the British Heart Foundation (JMW, JMJW).

Fig. 2 .
Fig. 2. Line graph showing number of computational PVS quantification papers published between 2002 and September 2023.

Fig. 3 .
Fig. 3. Comparative sample sizes of the studies in the method development category.Dubost et al. and Ramirez et al. used overlapped subsamples from the same data sources.For these cases, the largest sample is represented in the graph.

Fig. 5 .
Fig. 5. Average DSC and standard deviations (interquartile range for the schemes proposed in Sudre et al. (2022)) results for each of the schemes proposed.

Table 2
Population types and clinically-relevant demographic groups.Percentage with respect to the sources in each group (Note: in the Improvement category two studies did not use any human sample but phantoms and/or mathematical models)

Table 3
Population age groups.
Legend: N: number of sources, %: Percentage with respect to the number of sources in each group, * : Not applicable, as no human data is used (studies on mathematical/phantom approaches only)

Table 4
Range and average sample sizes (N).

Table 5
MR sequence combinations.
Legend: N: number of sources, %: Percentage with respect to the number of sources in each group * : MRI sequences are not relevant for one of the Improvement papers.** : An application study has T1w, T2w, FLAIR, and 3DFlow MRI, but only uses T1w and T2w for PVS segmentation.)

Table 6
MRI field strengths used.
Legend: N: number of sources, %: Percentage with respect to the number of sources in each group

Table 7
Method development study summaries.
(continued on next page) J.M.J.Waymont et al.

Table 8
Types of computational approaches.
Legend: N: number of sources, %: Percentage with respect to the 48 sources in the Method development group publication describing the filtering approach proposed byBoespflug  et al. (

Table 9
Computational PVS quantification improvement study summaries.

Table 10
Application study summaries.
(continued on next page) J.M.J.Waymont et al.

Table 10
PVS volume higher in older and male participants.BMI, time of day (continued on next page) J.M.J.Waymont et al.

Table 10 (
continued ) (continued on next page)