NeuroImage: Clinical

accurate quantitative imaging biomarkers that can be used to predict the future clinical diagnosis of a healthy subject, or to quantify the progression of symptoms over time. Besides, CJD is commonly mistaken for other forms of dementia. Due to the heterogeneity of phenotypes and the lack of a consistent geometrical pattern of disease progression, the approaches used to study other types of neurodegenerative diseases are not satisfactory to capture the progression of human form of prion disease. In this paper, using a tailored framework, we aim to classify and stratify patients with prion disease, according to the severity of their illness. The framework is initialised with the extraction of subject-speci ﬁ c imaging biomarkers. The extracted biomakers are then combined with genetic and demographic information within a Gaussian Process classi ﬁ er, used to calculate the probability of a subject to be diagnosed with prion disease in the next year. We evaluate the e ﬀ ectiveness of the proposed method in a cohort of patients with inherited and sporadic forms of prion disease. The model has shown to be e ﬀ ective in the prediction of both inherited CJD (92% of accuracy) and sporadic CJD (95% of accuracy). However the model has shown to be less e ﬀ ective when used to stratify the di ﬀ erent stages of the disease, in which the average accuracy is 85%, whilst the recall is 59%. Finally, our framework was extended as a di ﬀ erential diagnosis tool to identify both forms of CJD among another neurodegenerative disease. In summary we have developed a novel method for prion disease diagnosis and prediction of clinical onset using multiple sources of features, which may have use in other disorders with heterogeneous imaging features.


Introduction
Prion diseases, also known as transmissible spongiform encephalopathies (TSEs), are a group of progressive neurodegenerative conditions, which cause cognitive impairment and neurological deficits (Collinge, 2001). All prion diseases involve a change in conformation of the normal cell surface prion protein (PrP) into multichain prion protein (Parchi et al., 2010;Pocchiari et al., 2004;Siddique et al., 2010).
The different forms of prion disease can be grouped by aetiology, accordingly, whether they are sporadic (unknown cause), inherited, or acquired (transmitted between mammals or humans). The sporadic form (sCJD) is the most common and it accounts for about 85% of the annual incidence of human prion disease. This form of CJD shows a significant neuronal loss, and vacuolisation within cell bodies and dendrites that gives a spongiform appearance to the cerebral cortex and deep nuclei (Johnson, 2005). sCJD is also characterised by the rapid progression of symptoms with prominent cognitive decline. The median time of survival after clinical onset is only 5 months, and 90% of the patients die within one year (Pocchiari et al., 2004). The inherited prion diseases (IPD) are caused by autosomal dominant inheritance of mutations in the PRNP gene, which in total are responsible for 10-15% of the incidence of human prion disease (Mead, 2006). Over thirty different mutations in PRNP have been found in patients presenting IPD, with about 95% of familial cases caused by four mutations (point mutations in codons 102, 178, 200 and 210) and insertions of five or six octapeptide repeats (Johnson, 2005). IPD has often an earlier clinical onset when compared with sCJD, which also presents a wide range of clinical onset ages, from 20 to 70 years old. The clinical course of IPD can be much longer than sCJD, up to 20 years (Mead, 2006). Due to these long clinical durations, IPD has a similar prevalence in the population than sCJD, despite a much lower incidence.
The clinical diagnosis of both forms of CJD can be challenging during life, due to the heterogeneity of the observed phenotypes, particularly in the earlier stages of the disease as they can mimic other neurodegenerative diseases. While the definitive diagnosis is still only possible by brain biopsy, the improved understanding of the pathogenesis of prion diseases have allowed definition of recognizable clinical features and a replicable diagnostic criteria in vivo (Fragoso et al., 2017). The diagnosis criteria are based on a set of neurological, cognitive and psychiatric observations Thompson et al., 2013a). Moreover, noninvasive imaging techniques, such as magnetic resonance (MR), computed tomography (CT), positron emission tomography (PET) or single-photon emission computed tomography (SPECT) have also been used to diagnose CJD. The qualitative assessment of neuroimaging data has proven to be useful to identify and characterise CJD among other pathologies (Caobelli et al., 2014;Manix et al., 2015;Schroter et al., 2000;Zerr and Poser, 2002).
Several quantitative imaging measures and potential imaging biomarkers have been studied in order to improve the sensitivity and specificity of CJD diagnosis based on neuroimaging data. Siddique et al. (2010) investigated the cross-sectional, longitudinal and post-mortem cerebral magnetisation transfer ratios (MTR) as a surrogate for CJD progression. Highly significant associations were found between whole brain MTR and prion disease. Alner et al. (2012), explored the potential of using the cortical thickness as a biomarker in order to characterise IPD, especially the 6-OPRI and P102L variants. They showed significant differences in the mean cortical thickness between 6-OPRI patients and controls in temporal, cingulate, frontal, parietal and occipital lobes; whereas only the mean cortical thickness of the parietal lobe was relevant to distinguish controls from P102L patients. In a different study using MTR (Vita et al., 2015), differences between controls and symptomatic subjects were seen in the caudate, hippocampus, putamen and cortex. The brain progressive structural changes were also identified by applying longitudinal voxel-based morphometry (VBM): significantly greater rates of grey matter decline were observed, predominantly in the pons, the corpus callosum, the thalamus and the putamen, when comparing controls and symptomatic subjects (De Vita et al., 2017). Many studies have also explored the potential of fluid-attenuated inversion recovery imaging (FLAIR) in the detection of CJD (Collie, 2001;Kallenberg et al., 2006;Murata et al., 2002;Young et al., 2005). FLAIR has been shown to be reliable to detect the earlier stages of the disease than T2-weighted (T2w) and diffusion-weighted imaging (DWI). However, FLAIR is less effective than DWI to detect lesions, which become less prominent during the course of the disease (Shiga et al., 2004;Young et al., 2005). The mean apparent diffusion coefficient (ADC) has proven to be a sensitive imaging biomarker for diagnosis based on abnormalities in the caudate, putamen and pulvinar nuclei. Research suggests that brain volume loss in inherited prion diseases is followed by cerebral ADC increase, and correlates with disease severity . Besides, other DTI measures computed from DWI, such as fractional anisotropy (FA), mean diffusivity (MD) and radial diffusivity (RD) were used to assess the relevance of the putamen as a biomarker in the diagnosis of CJD (Hyare et al., 2013a;2013b).
Despite the promise of quantitative biomarkers, clinical diagnosis still relies only in the qualitative evaluation of MRI scans.
To address this limitation we proposed a framework that aims to (i) extract quantitative imaging biomarkers from MR images, (ii) show that those are fit for the diagnosis of CJD in its earlier stages, and (iii) diagnose and stratify the subjects according to severity stages of symptoms design based on a widely used clinical severity measure, the MRC Prion Disease Rating Scale Thompson et al. (2013a).
To our knowledge, this study is the first attempt to use quantitative features extracted from MR images, obtained using different MRI pulse sequences, as inputs in a classification and differential diagnosis tool, in order to capture the evolution of prion disease neuropathology over the course of the disease. Furthermore, the proposed framework also allows to assess disease severity using imaging data.
By taking advantage of tailored imaging biomarkers for classification, we believe that the proposed method can be used to achieve an early diagnosis of CJD. This it is particularly useful to improve early patient recruitment to clinical trials and/or CJD studies, and monitoring disease progression during the trial.

Feature extraction
For each subject, we extracted quantitative features from the three MRI pulse-sequences available in the dataset, which provide complementary information about brain microstructural changes caused by CJD. The framework, Fig. 1, consists of three sections: (A) data preprocessing, including artefact correction steps such as bias field correction, reduction of the partial volume effect, and correction of the effects of eddy currents in DWI scans; (B) and (C) specific feature extraction and quantification according to each MRI sequence. In section (A) both DWI and FLAIR scans were rigidly registered to T1w scans using the NiftyReg open-source software (Modat et al., 2014). We regressed out the impact of confounding effects, such as age and the total intracranial volume, by comparison with a healthy population. This correction is applied a priori to all the features extracted from different sequences.
To identify nerve cell loss and consequently atrophy of cortical and deep GM areas, we extracted volumetric information from T1w MRI scans using automated region of interest delineation. The Geodesical Information Flows (GIF)  algorithm, that relies on multi-atlas segmentation propagation, was used to parcellate the brain into multiple regions. The volume of each 128 individual brain region is then computed.
Hyperintensity abnormalities visible on FLAIR images need to be carefully considered since the degree and distribution of these histological changes tend to vary significantly among the different time of scanning (Chung et al., 1999;Zeidler et al., 2000). To characterise the degree abnormality in each subject's brain, we considered as a feature the distribution of signal intensities in GM tissues in FLAIR images. Using the Bayesian Model Selection (BaMoS) algorithm (Sudre et al., 2015), we automatically segmented the normal and abnormal appearing tissue types. Knowing that CJD mainly causes lesions in the GM tissues, we computed the Mahalanobis distance (Geoffrey McLachlan, 2004), between the normal appearing WM intensity distribution and the GM intensities for each region of interest as defined by T1w derived parcellation. The Mahalanobis distance per region, d M(GM,WM) is computed as where μ GM is the mean of intensities in each GM region, μ WM is the mean intensity of WM tissue after excluding the lesions detected as outliers, and S (GM,WM) corresponds to the covariance between the two intensities distributions.
The obtained values are a quantitative measure of signal abnormalities in GM and they can be used as a feature with the assumption that the larger the amount of hyper-intensity in a given region of interest, the larger the Mahalanobis distance. The assessment of the hyper-intensities in the brain mimics the clinical practice, in which CJD is typically diagnosed based on the presence of these signal abnormalities.
The most typical brain microstructural change caused by CJD is vacuolation, or spongiosis. Spongiosis can result from abnormal membrane permeability and increased water content within neuronal processes; however, the molecular mechanisms behind vacuolation are still unclear (Soto and Satani, 2011). Spongiosis is visible in DWI scans as an increase in the diffusion signal and it can be quantified using the MD measurements. We initially processed the DWI scans using the NiftyFit pipeline, described in Melbourne et al. (2016), in which MD measurements are computed according to (Le Bihan et al., 2001). The median MD value per ROI is computed and used as feature.

Feature selection
In the literature, common approaches for imaging classification when applied to neurodegenerative diseases usually rely on a consistent set of features among subjects. Most neurodegenerative diseases follow a common spatial pattern among subjects and across time. However, these methods are not suitable for the characterisation of CJD, as the high heterogeneity of the disease yields no consistent spatial pattern of features in the brain, neither a defined pattern of events related to disease progression.
We then hypothesized that the disease does not follow a spatial pattern in the brain. Alternatively, we assumed the imaging biomarkers can become abnormal in any location in the brain, without following a particular order. The quantification of abnormality rather than its location is thus used to infer the progression of the disease. To characterise the amount of abnormality of signal for the different types of feature, we implemented a framework previously validated with IPD data (Canas et al., 2018b), in which the different features were converted into z-scores by comparison with measurements obtained from a population of healthy subjects. The absolute z-scored values are then ranked per feature type and only the highly ranked features are considered for subsequent learning and inference stages. In this study, we selected the 15 higher ranked regions of interest per subject for each sequence independently. As a result, the array of features where N is the number of subjects, F refers to the 15 selected features and m is the number of MRI sequences considered. We decided to use 15 brain regions since it does correspond to 12.5% of the subjects sample and consequently the optimisation problem becomes well-posed. As a consequence, only regions of the brain that most differ from the healthy control sample are kept for each subject, and the resulting sets of feature are subject-specific.

Model
In order to characterise the disease status of each subject using the available multi-source features, we designed a Bayesian framework to find the function that best fits the relationship between imaging features and the subjects' diagnosis. Bayesian frameworks, such as Gaussian Process (GP), are particularly useful to study CJD, since they allow robust modelling even in highly uncertain or incomplete datasets. GP is also commonly used to perform long term predictions, and has been shown to improve performances as the number of samples increases (Roberts et al., 2012). Since GP is a probabilistic model, it provides an estimation of the probability of a subject being assigned to a certain class. Note that the class probability estimations can be used as a measure of the confidence of the predictions, which can be useful in a clinical context as a proxy of the diagnosis confidence and as an indicator of subjects' prognosis.
We implemented a non-parametric kernel-based model , as follows: (2) This model was used to predict the probability of the outcome y ∈ , for a subject = … i N {1, , }, given a set of biomarkers X ∈ feature space. For the binary discriminative case, such as subjects' diagnosis, the output of the regression model is transformed into a class probability using a cumulative density function, also called as probit likelihood function, which converts its argument, which can lie in the domain −∞ ∞ ( , ), into the range [0, 1], guaranteeing a valid probabilistic interpretation. Therefore, the posterior probability of each (2) GIF algorithm (Cardoso et al., 2011) to T1, using (3) BaMoS algorithm to extract the intensity distributions of FLAIR (Sudre et al., 2015), and computing the diffusion tensor from DWI. C: Quantitative features computed from the images obtained in the section B of the framework. class for a subject i is then given by Eq. (3), where Φ(.) denotes the cumulative density function of the standard normal distribution (Rasmussen and Williams, 2004).
For the purposes of subjects diagnosis, the likelihood of p y f x ( | ( )) i i is a cumulative density function, therefore the posterior (Eq. (4)) is analytically intractable. To address this issue, we used the expectation propagation algorithm (EP) Minka (2001) to approximate the likelihood by a local likelihood approximation (Eq. (5)).
are the site parameters, as defined in Rasmussen and Williams (2004). Therefore, the posterior f y p X ( | , ) can be approximated by: The tilde notation corresponds to the local approximation of the likelihood via EP approximation, as defined in Rasmussen and Williams (2004).
As aforementioned, the CJD phenotype is better explained by the interaction between several types of features; thus, a basis kernel function is insufficient to describe the variance of the features. Besides, it is reasonable to assume that the features extracted from one MRI sequence does not show a consistent relationship with the features extracted from other sequences, during all stages of the disease Chung et al. (1999).
The inter MRI sequence relationship can thus be modelled as a multi-task paradigma contribution of independent functions that explain the biomarkers' progressions. A sensible way to model a GP as a multi-task model is using an Additive GP. By implementing an Additive GP we are able (1) to express superposition of different processes contributing to the some output and (2) to improve model interpretability, since it provides information about relative weighting of different functions and their orders of interaction (Duvenaud et al., 2013;. The latent function f in model , Eq. (2), takes the form of where M refers to the number of MRI sequences taken into consideration in the model (Rasmussen and Williams, 2004). The imaging biomarkers were encoded in individuals kernel matrices  (Duvenaud et al., 2013;. Therefore, the matrix K, Fig. 2, which encodes the imaging biomarkers, is obtained by the addition of the kernel matrices computed individually using the information extracted from the MR sequences. Note that we estimate the kernel matrices using the ranked magnitude of the feature abnormality, encoded by their z-score, rather than their spatial information. The consistency across features is thus with respect to their subject-specific rank of abnormality rather their spatial location. Using this approach, we construct one kernel matrix per modality before combining them using an additive GP. Our model also accounts for the individualised pattern of each genetic mutation of the genetic form of prion disease, defined as a categorical variable in the kernel matrix K RP . To reduce the bias introduced by the high number of genetic mutations, we grouped the subjects in two clusters according to the expected rate of disease progression associated with each mutation: (1) slow and (2) fast, defined based in the clinical knowledge 1 about the different mutations, Eq. (7), where r is the rate of progression of a given subject, such as ′ ∈ r r , , and it is defined in the features array.
We understand that this is not actually genetic information, but rather a cluster of mutations with similar physiological behaviour. Using this kernel with the aforementioned information, we show the flexibility of our model to deal with both categorical and continuous data, such us genetic and quantitative imaging data respectively. Future work should make use of this kernel matrix to encode any other relevant genetic data, such as SNP information.
The K is lastly combined with the categorical covariance function by means of the Hadamard product, ⊙ K K RP to produce a hierarchical model, where K is the sum of the kernels used to encode imaging data.
The modified latent function f X Θ ( | ) which encodes the mutation information is where Θ is the vector of parameters of the model, which includes the hyperparameters of the kernel functions and the sample variance: = θ θ θ Σ Θ [ , , , ], The estimation of y * requires to find the best hyperparameters of each kernel covariance function. The hyperparameters Θ of the kernel functions are estimated via the maximisation of the marginal likelihood of the model, p Θ ( | ), as described in Eq. (8); i.e., the marginalisation over the kernel parameters is performed by maximum a posteriori algorithm (MAP), and the hyperparameters Θ are estimated by bootstrapping (Rasmussen and Williams, 2004).
The full model is illustrated in Fig. 2.

Subjects diagnosis
In order to diagnose symptomatic subjects among healthy controls, we modified the model to estimate the best predictions under the where 0 denotes the healthy controls and 1 denotes the symptomatic subjects. For the binary discriminative case, such as subjects' diagnosis, the output of the regression model is transformed into a class probability using a probit likelihood function. Therefore, the posterior probability of each class for a subject i is thus given by Eq. (3), where Φ(.) denotes the cumulative density function of the standard normal distribution (Rasmussen and Williams, 2004).
The imaging biomarkers are encoded in K m through the use of a squared exponential kernel function (SE) with hyperparameters where σ f 2 is the signal variance and l 2 is the length scale (Rasmussen and Williams, 2004). The SE function is widely-used within binary classification problems, especially for its main assumptions: smoothness and stationarity.
In the final section of the proposed model, illustrated in Fig. 2 by the blue section, we used the optimised model to estimate the predictive label y * for a new subject 2 = ⋯ j S {1, , } from the testing and validation sets. The predictive probability is given by Eq. 10. Similarly, the mean q  and variance q  of function f * is computed using Eqs. (11) and (12) respectively.
. Note also that the ∼ μ is the site parameter defined in Eq. (5). The tilde notation refers to the local approximation used in the EP algorithm as defined in Rasmussen and Williams (2004).
The analysis of the latent models that compose f * provide the information about the best combination of features to diagnose prion disease.

Subjects stratification
Being able to diagnose CJD at the early stages of the disease could increase participation in clinical trials, which is currently challenging as patients usually die in less than 12 months from diagnosis (Bradford et al., 2014). Therefore, the prediction of the time to clinical onset of IPD patients is one of the aims of this study.
The proposed model does not give a continuous measure of the time to onset, in years. Nevertheless, the stratification of the subjects according to the severity of symptoms, or the proximity to clinical onset stage, can be interpreted as the subject's prognosis. For that purpose we adapted the generative model 2 to predict the stage of the disease for a subject i given the set of features X. The estimated probabilistic class provides a clinical input regarding the severity of symptoms of prion disease. We implemented a multi-class classification GP based on individualised likelihood factors computed for the target classes defined by for the subject i. These classes correspond to the five predefined stages of the disease: (1) healthy control (HC), (2) asymptomatic subjects (Asym.), (3) subjects with an MRC Scale score of 20 i.e., asymptomatic 3 or early symptomatic but no accrued neurodisability, within one year inside of the clinical onset window (CO), (4) to (5) symptomatic subjects divided in 2 severity quantiles according to their MRC Scale scores (Thompson et al., 2013b). The MRC Scale is a functional composite scale measured as a 0-20 score developed using item-response modelling (Thompson et al., 2013a). Items rated include memory and orientation; continence; self care; mobility; and communication.
Due to the number of classes under consideration, f i is a vector The multi-class classification is performed by means of a multinomial probit likelihood, as defined by Eq. (13), where the auxiliary variable u i is distributed as Analogous to binary classification, the posterior of a multiclass GP is analytical intractable, thus it requires the approximation of the likelihood. To keep the consistency across classification tasks, we used EP algorithm. However, in case of binary GP the estimation of the tilted distribution (defined in Eq. (5)) requires solving one-dimensional integrals, and assuming the probit likelihood function, these univariate integrals can be computed efficiently without numerical quadratures (Minka, 2001;Rasmussen and Williams, 2004). For the multiclass paradigm the solution is more complex, since we need to evaluate the multi-dimensional integrals . The approximation of the tilde variables can be done by Laplace approximation (LA) (Rasmussen and Williams, 2004). The problem with the LA approach is that the mean is replaced with the mode of the distribution and the covariance with the inverse Hessian of the log density at the mode.
Because of the skewness of the tilted distribution caused by the likelihood function, the LA method can lead to inaccurate mean and covariance estimates in which case the resulting posterior approximation does not correspond to the full EP solution.
The marginal likelihood is then approximated via a nested expectation propagation (nEP), which does not require numerical quadratures or sampling to estimate the predictive probabilities, as detailed in Riihimäki et al. (2013).
In this particular case of the model , the imaging biomarkers are encoded using a linear combination of linear logistic kernel functions (Eq. (14)), b is the intercept of the linear part and a is the regression coefficient of the linear part. This function is used to encode the variance of the biomarkers over the different stages of the disease. In previous studies focusing on neurodegenerative diseases, it has been demonstrated that imaging biomarkers show a logistic evolution over time (Canas et al., 2018a).
Similarly, to the model described in Section 2.4, the function f is defined as where Θ is the vector of parameters of the model, which includes the hyperparameters of the kernel functions and the sample variance: The predictive probability for a new subject j, given the optimised model, is given by an extension of Eq. (11), detailed in Riihimäki et al. (2013).

Differential diagnosis
Lastly, due to its rarity, CJD is commonly mistaken for other types of dementia, which results in a higher rate of undiagnosed subjects, and patients who present at a more advanced disease stage. Improving early diagnosis could permit (1) a more effective management of the disease 2 The * notation refers to the inference for a new sample, using the optimised model. 3 The asymptomatic subjects considered as part of clinical onset (CO) had have the diagnosis confirmed in later MRI images.
symptoms and (2) planning for end of life and (3) recruitment to clinical trials. We adapted the model to be used as a differential diagnosis tool, and applied it to identify CJD among another form of dementia: young onset Alzheimer's disease (YOAD).
As described in Section 2.5, we computed individualised likelihood factors for the target classes defined by for the subject i, where (1) corresponds to healthy controls, (2) to IPD, (3) sCJD and (4) YOAD. Analogous to subject stratification task, Eq. (14) was used to encode the imaging biomarkers.

Model evaluation
We evaluated the performance of model in terms of the robustness and accuracy of the predictions. The robustness of the estimations and the stability of the results were assessed through bootstrapping.
The effectiveness of the subjects diagnosis was assessed using sensitivity, specificity, accuracy and false rate of discovery (FDR) (Sokolova and Lapalme, 2009). The receiver operating curves (ROC) and the area under the curve (AUC) were computed using the formulation for ROC graphs proposed in Fawcett (2006).
Both subjects stratification and differential diagnosis analyses were performed as a multi-task classification using unbalanced classes. Due to the multi-task paradigm and the unbalanced nature of the data, we use macro-averaging measures, generalised from the measures for binary classification evaluation (Sokolova and Lapalme, 2009). The averaging accuracy Acc (Eq. (15)), macro-recall R M (Eq. (16)), macroprecision P M (Eq. (17)) are computed for model evaluation. Note that the tp c is the true positive for the class , c and fp c false positive, fn c false negative, and tn c true negative counts respectively. The predicted label for each subject was obtained based on the class with highest probability among all the possible classes.
To evaluate the probabilistic predictions performance of the model, we also computed the multiclass logarithmic loss (Eq. (18)): where p o,c is the probability of observation of class .

Data processing
The data used in this study were obtained from the National Prion Monitoring Cohort (NPMC). NPMC (2008-) is a prospective observational interval-cohort study of patients with any form of prion disease in the UK or willing to travel to the UK. It includes regular follow-up clinical and psychological assessments of sCJD patients and patient with IPD and their relatives who may be known carriers of PRNP gene mutations, at-risk but not had a genetic test, or healthy controls. The current dataset comprises (a) symptomatic patients with confirmed prion disease diagnosis, for both the inherited and sporadic forms of the disease, defined in the following sections as IPD and sCJD respectively; (b) healthy subjects without a clinical diagnosis of IPD who carry PRNP gene mutations and are therefore at increased risk of disease in the future, defined in this study as asymptomatic subjects; (c) healthy individuals without either prion disease or increased risk, defined as healthy controls (HC). From the sample aforementioned detailed, we defined a group composed by the subjects at clinical onset (CO) (subjects within one year of a clinical diagnosis and an MRC scale of 20 or less), in order to examine specific brain changes occurring during the clinical onset. To avoid the overlap of criterias used to defined both the CO and IPD groups, the IPD group is composed only by symptomatic subjects with an MRC scale equal or lower than 20, in which the scans were acquired more than one year after clinical onset. Data from all 125 subjects include MRI scans, neurological and neuropsycological assessment and scoring using the MRC Scale Thompson et al. (2013a).
MRI was acquired using a Siemens Magneton Trio (Siemens, Erlanger, Germany) 3 Telsa with the conventional body coil for transmission and a 32-channel head-only receive coil. Structural imaging used 3D T1-weighted images (T1w) MPRAGE sequence with repetition time 2.2 s, echo time 2.9 ms, inversion time 900 ms, echo spacing 6.7 ms, flip angle 10 ∘ , matrix size 256 × 256 × 208, voxel size 1.1 × 1.1 × 1.1 mm. 2D Axial FLAIR were acquired using a standard clinical FLAIR-TSE sequence with a voxel size of 0.9 × 0.9 × 5.2 mm. The diffusion weighted imaging (TR/TE 9500/93ms) were acquired using 64 non-colinear directions at = b 1000s/mm 2 and 8 images with b = 0. For all subjects a T1w image was acquired as well as either a FLAIR, a DWI, or both.
The YOAD subjects used for the differential diagnosis experiment are part of a larger study of young onset Alzheimer's disease, for which ethical approval was obtained from the National Hospital for Neurology and Neurosurgery Research Ethics Committee. T1w and DWI images were acquired using the same scanner. The T1w pulse sequence was identical to the one used for the other data sets. The DWI acquisition however differ for the two datasets. Multiple shells were acquired for the YOAD dataset. We here used the shell that had the most similar bvalue (b = 700) than the one used for the prion data acquisition (b = 1000). We acknowledge this limitation as a potential bias. For YOAD subjects no FLAIR images were acquired.
The sample's demographics are detailed in Table 1.

Experiments
In order to assess the performance of the proposed framework, we designed four experiments to evaluate: (1) The feature selection strategy, (2) the subjects diagnosis classification performance, (3) the subjects stratification sensitivity and (4) the differential diagnosis capabilities. In experiments (2) to (4), we trained the model using 75% of the overall sample while keeping the input ratio between the different groups. The testing set corresponds to the remaining 25% of each sample. The hyperparameters of the model were optimised using an open-source toolbox 4 . In order to obtain a robust evaluation, we applied a cross-validation scheme with 500 runs for all experiments. Lastly, to understand what is the best set of MRI sequences for each task, we analysed the predicted labels obtained with the latent functions of our model. The latent functions consist of functions where individual likelihood factors depend on multiple latent variables. In this study, the latent variables are represented by the different kernel matrices that encode the imaging features. By comparing the predictive accuracy obtained by the latent functions, we can directly compare the performance of the model with different combinations of features and, consequently, infer what is the best set of features for a specific task. This analysis was performed for the experiments (2) to (4).
(1) Feature selection As previously detailed in Section 2.8.1, we grouped the subjects based on their prion disease subtypes and the existence of clinical symptoms. Using a non-parametric statistical test, we examined whether the selected features could differentiate the groups. We also implemented a multi-comparison test to investigate which set of features allows the best differentiation of the groups. The resulting p-values were corrected for multiple comparison using the Bonferroni method. All the subjects were included in this experiment, independently of the number of the MRI sequences available, since the features extraction and selection were performed separately for each set of features.
(2) Subjects diagnosis We evaluated the ability of our model to correctly diagnose subjects with the two subtypes of CJD independently. We performed the diagnosis of the subtypes of CJD separately in order to avoid the confounding effects related to the specific features of each subtype. For both IPD and sCJD subtypes, we classified only the subjects clinically labelled as symptomatic. Only the subjects with the three MRI sequences have been included in this experiment, due to the design of model which requires joint modelling of the three set of features. Note that for IPD subjects the rate of progression varies as mentioned; whereas the sCJD subtype is always considered as having a fast progression. Further, to avoid missing information for healthy controls, these were randomly assigned a value between 1 and 3 to encode a virtual rate of progression.
A squared exponential Support Vector Machine (SE-SVM) was also used to perform the subjects diagnosis. The results of SE-SVM were compared with the our model and considered as baseline.
(3) Subjects stratification Using cross-sectional data, we trained the model to perform subjects' stratification on the HC, asymptomatic subjects, IPD and sCJD. Note that in this specific experiment the sCJD and IPD subjects are jointly classified, based on their MRC Scale score. Contrarily to experiment (2), we neglected the impact of the disease rate progression modelled by the categorical kernel K RP . Here, we assumed that sCJD patients show a disease progression rate analogous to the IPD subjects with the fastest progression rate. The motivation to include the sCJD subjects in this experiment is the small sample size of IPD patients with more severe symptoms -MRC scale lower than 14 (stage II, Table 1). Therefore, we considered 5 classes: (1) HC, (2) stable Asymp., (3) COsubjects at clinical onset (early symptomatic subjects with MRC Scale score equal to 20 and within an year window after conversion), (4) S-I -Symptomatic subjects with MRC Scale score between 19 and 15, and finally (5) S-II -Symptomatic subjects with MRC Scale score below 14. Only 2 subjects had an MRC Scale score below 10, and they were included in the last group. Similarly to experiment (2), to model the different rates of progression of the subtypes of CJD and of the IPD mutations, we included the K RP in the model.
(4) Differential diagnosis: Lastly, we compared the CJD subtypes against a clinically releted form of dementia. For this experiment, we included the HC, IPD, sCJD and YOAD groups. The asymptomatic subjects have been excluded to avoid the presence of confounding effects during the training of the model. Asymptomatic subjects form indeed a heterogeneous group as individuals can be days or decades from clinical onset. The features used to characterise YOAD subjects were obtained from DWI and T1w MRI scans, which have been processed using the framework detailed in Section 2.1 and 2.2. Similarly, only DWI and T1w imaging features were considered to characterise CJD subjects. Note that the feature selection section of this framework was tailored to maximise the information related to CJD symptoms; thus, the features do not encode the spatial pattern that characterises YOAD.

Results
The proposed framework was used to extract the individual features from T1w, DWI and FLAIR, which were then used as input feature in a classification algorithm aiming either at the diagnosis or staging of the individuals. The resulting features are detailed in Section 3.1. The results of the subjects' diagnosis for both IPD and sCJD are summarised in Section 3.2. We present in Section 3.3 the results of the subjects' stratification experiment. The effectiveness of the differential diagnosis tool is presented in Section 3.4.

Feature extraction and selection
We evaluate the statistical significance of the features extracted before feature selection. Table 2 shows the brain regions and their respective p-value computed using a two sample t-test comparing the different groups with the healthy population. Neither the Asymp. nor the CO presented any significant difference in features compared to the healthy controls. The features extracted from FLAIR are insufficient to identify brain regions that are relevant to diagnose CJD. Fig. 3 shows the p-values of the brain regions significantly different from the healthy population, projected into the MNI152 linear template Fonov et al. (2009) using the MRIcroGL visualisation software. 5 The brain regions are considered as significantly different from the healthy population for the p-value < × − 3.79 10 , 04 presented in the Fig. 3 with light orange.
Due to the assumption of spatial heterogeneity of brain changes caused by prion disease, we implemented a subject-specific features extraction and selection. This approach selects the most significant features to characterise the evolution of symptoms for each subject, neglecting the spatial origin of features in the brain. Fig. 4 shows the mean of the 15 most significant features per subject and across groups.
The results of the statistical analysis performed on the features selected are detailed in Table 3. The mean of the highest ranked features extracted from both T1w and DWI are significantly different across groups, whereas the features extracted from FLAIR images have not shown statistical significance across groups.  (12) 19 (11) 19 (10) -Furthermore, using a multiple comparisons test we also analyzed which set of features allow differentiation of the patient groups. This experiment indicates that DWI and T1w features enable the diagnosis of sCJD vs HC with high statistical significance, whilst the T1w features identify the disease stage of IPD (vs Asym.) with highest statistical significance. To investigate the influence of each feature to the subjects' diagnosis performances, we evaluated the accuracy of the predictive classes obtained using the latent models. Table 4 details the accuracy of the classification of sCJD patients using different sets of features. The imaging structural biomarkers extracted from FLAIR images appear to be insufficient to diagnosis sCJD subjects at the early stages of the disease, whereas the MD measures computed from DWI scans have the strongest influence in the diagnosis of sCJD, followed by the T1w imaging biomarkers.

Subjects diagnosis
We also evaluated the predictive accuracy of the model for diagnosis of IPD subjects. Similarly to the sCJD diagnosis, the intensity based features extracted from FLAIR achieved a lower accuracy when used as single features in the model. It can be observed that including the rate of progression associated with specific mutations yields an improvement of the predictive accuracy. Note also that the full model did not Table 2 Evaluation of the statistical significance of the imaging biomarkers, before feature selection. The two sample t-test was used to identify which brain regions show significant differences between symptomatic subjects and the healthy population. The p-values indicate the test rejection of the null hypothesis at 5% significance level, considering the Bonferroni correction.  necessarily achieved the best performances for all metrics, which can be justified by the introduction of noise due to the features' interactions.
Lastly, the predictive accuracy of the SE-SVM model was also evaluated for each set of biomarkers: T1w, FLAIR and DWI. Tables 4 and 5 show that the accuracy obtained from the binary classification using SE-SVM is comparable with the predictive accuracy of our model, namely on the sCJD diagnosis. Nevertheless, our model showed a significantly lower logarithmic Loss in both tasks, which is translated in a lower uncertainty of the predictions given by our model. Therefore, even with a higher accuracy, the SE-SVM is not suitable to be used in clinical context given the uncertainty of the predicted classes.
Furthermore, the good results obtained by the SE-SVM sustained the hypothesis that subject-specific features are suitable to diagnose CJD, since these features even when used in a different classifier lead to a good identification of prion disease. Fig. 6 shows the normalised confusion matrix for the testing set. The qualitative analysis of the confusion matrix suggests that the model is able to correctly identify the extreme stages of the disease, while being less accurate in the differentiation of the intermediate stages of the disease. The results reported in Fig. 6 are deterministic and they do not account for the fuzziness of the classes estimated, particularly for the asymptomatic stage. Fig. 7 shows the correlation between the categorical (discrete) labels and the average probability given to each class, computed through bootstrapping. The probability distribution across classes gives a more intuitive interpretation of subjects' clinical status. For a clinical application, the probability of the prediction associated with the predicted label corresponds to the model confidence in its output.

Subjects stratification
We further investigated what is the best combination of features to achieve a good stratification of subjects, by computing the predictive classes obtained from the latent models. Table 6 presents these findings, where the average accuracy across stages is higher for the jointly modelling of the three set of features (Average Acc = 85 ± 4%) combined with the progression rate of the individual mutation. The logarithmic loss, computed across classes, is also lower for the full model ( = ± 1.74 0.44), supporting the assumption that by using the three MRI sequences we better explain the CJD symptoms and we are able to infer the subject's prognosis. Moreover, the low logarithmic loss translates the higher certainty of the classes correctly label, whilst the  HChealthy controls; Asym.asymptomatic subjects; COclinical onset; SIstage I and SI -stage II of the disease.

Fig. 5.
Predictive accuracy of the model for both IPD and sCJD subjects, when considering a dataset composed by the three MRI sequences. The predictive accuracy for both IPD and sCJD subjects, using squared exponential SVM (SE-SVM) is also computed using the three modalities. The ROC curves are computed considering the predicted labels of 500 iterations, as proposed by Fawcett (2006). classes wrongly predicted have often higher uncertainty related to them.
As the model has not been re-trained for each latent model individually, the performance on the latent models that include DWI and FLAIR are very similar. This is due to their limited influence on the full model.

Differential diagnosis
By modelling the joint contribution of the three sets of features it is possible to achieve a good differentiation between the symptomatic CJD subjects and the YOAD patients, as reported in Fig. 8. The confusion matrix also shows that only approximately 8% of CJD subjects are labelled as HC. We presume that the slower rate of progression of IPD subjects and the higher number of subjects with MRC scale of 20 lead to less evident symptoms and consequently a proximity to the HC biomarkers pattern.
The results also indicate that 75% of the YOAD subjects have been correctly labelled, showing a probability of 0.61 of being YOAD (Fig. 9); whereas the CJD subjects have shown a probability of 0.35 of being wrongly labelled as YOAD subjects, with a higher uncertainty in the differentiation of IPD and YOAD. The overlap between these two classes is explained by the lack of a specific kernel matrix to explain the spatial differences between the two diseases. Our approach only correlates the magnitude of symptoms and the correlation between the features Table 4 Evaluation of the full model used for subjects diagnosis, for Sporadic CJD (sCJD). The mean value and standard deviation of 500 iterations is computed for all the metrics used for performance evaluation. The AUC is computed considering the results of all iterations. The false discovery rate (FDR) is also evaluated. All the evaluation measures are presented in percentage, excepting the AUC and the . SE-SVM stands for Support Vector Machine with squared exponential kernel.    selected to a specific form of dementia. Nonetheless, this section is an illustrative example of the flexibility of the proposed model and the possibility of being used as a differential diagnosis tool, particularly to identify CJD among other types of dementia. The analyses of the latent models performance show that the categorical kernel used to encode the progression of IPD mutations, highly improves the accuracy and precision of the differential diagnosis tool. Fig. 10 highlights that the combination of features from DWI and FLAIR is the most sensitive to differentiate IPD from YOAD, whereas T1w images combined either with DWI or FLAIR show better results in the identification of sCJD among YOAD patients. The micro-structural changes happening in the brain of IPD symptomatic patients visible in DWI and FLAIR can be used as the main feature to distinguish this type of CJD from other neurodegenerative syndromes.

Discussion
In this study we proposed a novel framework to extract quantitative imaging biomarkers to diagnose and stage prion disease patients. These biomarkers were also used to characterise the disease stage within a stratification task. To the best of our knowledge, this work is the first attempt to extract quantitative imaging biomarkers to be used in an automated diagnosis tool for prion related applications.
Due to the high heterogeneity of the clinical manifestations of prion diseases, it is very challenging to select useful biomarkers that may be used to comprehensively characterise all the different subtypes of prion disease and to perform an accurate diagnosis.
Previous studies have shown signs of atrophy in temporal, cingulate, frontal, parietal and occipital lobes caused by IPD (Alner et al., 2012). By using the structural biomarkers obtained via the proposed framework, we also identified statistical differences in the occipital gyrus, specifically in the cuneus, for both IPD and sCJD. The left and right hippocampus and central opercullum had been identified as meaningful regions to identify CJD, as suggested by De Vita and collaborators (Vita et al., 2015). However, we were not able to identify signs of atrophy in the temporal and parietal lobes. Our analysis also identified signal abnormalities in DWI scans. Statistical significant differences were observed in the sCJD sample, when compared with controls, in the left and right entorhinal areas, cerebellar vermal lobus I-VII. In turn, DWI signal differences were observed in the right cuneus, anterior cingulate gyrus, angular gyrus and central operculum for IPD subjects. Previous studies (Shiga et al., 2004;Young et al., 2005), have reported signal abnormalities in DWI scans in the caudate, putamen and pulvinar nuclei. Our analysis did not reveal relevant changes in those regions (Table 7).
By comparing the imaging features extracted from healthy controls and symptomatic patients, we observed that CJD disease burden weighs equally on each hemisphere. Furthermore, despite the initial assumption of spatial heterogeneity of the brain changes, we identified some regions with higher prevalence among subjects with the same form of CJD (Fig. 3). These results are explained by the broad spectrum of symptoms stages found in the IPD and sCJD groups. Due to the spatial heterogeneity of the symptoms, patients at similar disease stage exhibit abnormality in non-consistent areas of their brain. Therefore, eventbased models previously used to study other forms of dementia, in which a spatial homogeneity is seen for subjects at the same stage, cannot be used in the context of CJD Young et al. (2005).
Based on the clinical assumption that prion disease is highly heterogeneous even among subjects with the same mutation, we chose to extract subject-specific biomarkers. Thus, by extracting subject-specific Table 6 Performance of the proposed approach when used for disease staging. The mean value and standard deviation of 500 runs is computed for all the metrics used for multiclass evaluation. The values are presented in percentage, excepting the logarithmic Loss.    9. Likelihood of the predictive classes obtained from the differential diagnosis framework. The discrete confusion matrix is normalised by the number of subjects included in the classification task. The shadow area is the average distribution of probabilities per class.
biomarkers, we ensured that the lack of spatial pattern of biomarkers does not compromise the extraction of features that track subtle brain changes. The extracted imaging biomarkers (Section 3.1) have shown significant differences between healthy controls and symptomatic subjects, for both IPD and sCJD. Nonetheless, the intensity based features, computed from FLAIR images, did not show statistical relevance to separate symptomatic subjects from healthy controls. Currently, only MRI features are considered. For a better accuracy in prediction of prion disease severity or onset, we intend to include quantitative features from other sources, such as blood and CSF biomarkers. The biomarkers were then used in a non-parametric Bayesian approach to predict the subjects status. The predictive labels are based on the joint modelling of the biomarkers pattern by a Gaussian Process, producing a probabilistic labelling for each subject. We diagnosed independently the sCJD and IPD, evaluating the predictive accuracy of the labels for both subtypes. The reported results, detailed in Section 3.2, are indicative of the effectiveness of the model to detect prion disease patients, among healthy controls. Furthermore, the model was also able to diagnose subjects in the early stages of CJD, particularly for IPD symptomatic subjects with MRC Scale of 20, at which time the diagnosis can be otherwise very challenging. The results also suggest that the diagnosis can be achieved without all three MRI sequences; i.e., the biomarkers extracted from DWI scans and the jointly modelling of FLAIR, DWI and the rate of disease progression are equally predictive for sCJD and IPD when compared with the joint modelling of the three MRI images.
By comparing the predictive accuracy of our model with more simple frameworks, such as SE-SVM, we showed that our model is able to identify symptomatic prion disease subjects with lower uncertainty. Therefore, our model is more relevant in clinical context. This analysis could yield to improvement of the clinical workflow, since it provides information about which imaging biomarkers are more useful for the earlier diagnosis of CJD and thus avoiding unnecessary exams.
Notwithstanding the promising results obtained for subjects' diagnosis, there is not an effective characterisation of the different stages of the disease, neither the prediction of clinical onset for IPD patients. This can in part be due to the fact that our model only takes into consideration cross-sectional data and a reduced number of subjects who exhibit high heterogeneity of symptoms.
To improve our knowledge about the evolution of the disease over time, we extended the initial model to perform subject's staging according to the MRC Scale and by consequence the severity of brain changes. The new model can be seen as a disease progression model, defined as an additive multi-class GP. Contrary to other disease progression models currently used for other neurodegenerative diseases (Lorenzi et al., 2017;Oxtoby et al., 2018), our model does not assume a known ordering of events to stage the subjects in specific clinical status, neither an expected time-to-onset based on the familial clinical onset. Alternately, our model finds the correlation between subjects at the similar stage of the disease, by means of the covariance kernel function. The predicted stages of the disease are then computed based on the highest probability across classes. The overall accuracy (Average Acc = 85%), suggests that the model has been successful in stratifying the subjects based on their MRC Scale score. However, the analysis of the confusion matrix (Fig. 6) suggests that the model is not sensitive to classes with close intervals of the MRC Scale values. The creation of well defined clinical stages of CJD goes beyond the scope of this work, but a future study should investigate alternatives to MRC Scale to define the labels used to train the model for subject's staging. Furthermore, the current definition of CO is based on the functional disability (MRC Scale equal to 20) and the defined window of one year from the clinical onset; attending to the fast rate of progression of CJD and its diverse evolution among patients, we would need to investigate the best criteria to define this class. Similarly, Asymp. IPD are a very heterogeneous group that should also be stratified itself, in order to have a more accurate and sensitive training of the model.
Thanks to the probabilistic nature of the predictions, the model gives information regarding the predicted class for a given time-point, but also the closest classes for that time-point. This information can be Fig. 10. Predictive classes of the differential diagnosis task.The confusion matrices show the results of the latent models for the differential diagnosis task.

Table 7
Performance of the model for the differential diagnosis. The mean value and standard deviation over 500 runs is computed for all the metrics used for performance evaluation. The average accuracy, macro precision and macro recall are shown in percentage. RP refers to the rate of progression. used as a prognosis tool, since the transition between classes can be used to infer the severity of symptoms and consequently the stage of the disease. Bearing this in mind, we intend to test the model for the prediction of the several stages of the disease for the subjects with scans before and after the clinical onset. Only four subjects had all three scans modalities available before and after onset, therefore the results are inconclusive. Moreover, considering the current design of the model, different time-points for these subjects would be modelled independently, which consists in to a source of bias in the model since there is no dependency between results for a same subject. In the future, we will include the subjects excluded from this study due to missing modalities, increasing the sample size and guaranteeing more robust and meaningful results. Lastly, we also investigated the possibility of using the model as a differential diagnosis tool. The current framework has proven to be able to recognize the individual features of prion disease among another form of dementia: young onset Alzheimer's disease. Note however that the results reported in Section 3.4 are achieved without a particular modelling for the two types of dementia. The current formulation of the model relies only on the proximity of features pattern for the subjects with the illness. In the future, we plan to adapt the model to learn what is the best covariance kernel function for different neurodegenerative diseases. Following the good results obtained in this illustrative example, we intent to develop a new diagnostic tool based on quantitative measures, which should account for the uncertainty of the diagnosis, given the similarity of prion diseases to other syndromes. This new diagnostic algorithm, developed to identify prion disease among other neurodegenerative diseases, would improve the detection accuracy of this illness, and thus address the current high rate of misdiagnosis patients. 6 By analysing the predictive accuracy of the latent models, our approach also gives information regarding the combination of input features that better describes the response variable, defined here as subject's status. This means that we can extend our approach to learn the best model for a specific aim, aside of learning the best kernel function used to explain the variance of the features that characterise the subjects' symptoms.
More generally, due to their statistical nature, the performances of machine learning approaches are negatively impacted by small sample size in the presence of normal or pathological variability. Finally, a specific drawback of GP models is that their computational complexity scale poorly with the number of observations N; i.e., solving GP models requires N ( ) 3 computations. In the future, thanks to the flexibility given by the GP, we aim to extend our framework to account for the longitudinal information available. This will allow not only a more accurate stratification of subjects based on the extracted biomarkers, but also the subjects prognosis in a given time frame. We intend to do this by integrating a spatio-temporal covariance model, such as the Kronecker form proposed by Lorenzi et al. (2015), to provide a unified framework to model jointly the time-series of biomarkers measurements with different natures, for a given subject.

Conclusion
This study presents a novel framework to extract and select imaging biomarkers especially relevant for the diagnosis of prion diseases. We demonstrate that it is possible to use a non-parametric Bayesian algorithm for the diagnosis and subjects' stratification by disease severity. Moreover, the model presented in this study may also be extended to consider longitudinal data and to model individual brain changes; therefore, the model represents a promising tool for subjects' diagnosis and prognosis.

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.