Alzheimer’s Disease Classification Using Cluster-based Labelling for Graph Neural Network on Tau PET Imaging and Heterogeneous Data

Graphical neural networks (GNNs) offer the opportunity to provide new insights into the diagnosis, prognosis and contributing factors of Alzheimer’s disease (AD). However, previous studies using GNNs on AD did not incorporate tau PET neuroimaging features or, more importantly, have applied class labels based only on clinician diagnosis, which can be inconsistent. This study addresses these limitations by applying cluster-based labelling on heterogeneous data, including tau PET neuroimaging data, before applying GNN for classification. The data comprised sociodemographic, medical/family history, cognitive and functional assessments, apolipoprotein E (APoE) ε 4 allele status


INTRODUCTION
Alzheimer's disease (AD) is a pernicious and progressive brain disease where neurodegeneration is commonly manifested through a variety of comorbidities such as cognitive degradation, motor impairment, speech disturbance and psychiatric changes (Burns and Iliffe, 2009). AD exists as a subset of dementia, which encompasses a large variety of neurological disorders. Dementia is one of the leading causes of disability worldwide with approximately 50 million clinically diagnosed cases globally (WHO, 2020). As of 2019, AD contributes to between 60-70% of all dementia cases with the disease predominantly affecting those aged 65 and over (Ferri et al., 2005). AD generally manifests itself in two forms, familial AD and sporadic AD, of which the latter constitutes the majority (Wong-Lin et al., 2020), and will henceforth be the focus here.
With the availability of increasingly complex and heterogeneous dementia data and still sub-optimal dementia diagnostic procedures, decision support systems, with the aid of machine learning (ML) algorithms, are gradually becoming important (Wong-Lin et al., 2020). Recently, there has been an increase in a specific ML approach to detect AD -convolutional neural networks (CNNs) (Wen et al., 2020). In particular, graph neural networks (GNNs), which are generalised versions of CNNs (e.g. graph convolutional neural networks, GCNs), allows the number of network node connections to vary and the nodes are not ordered in an Euclidean manner (Gori et al., 2005;Scarselli et al., 2009). GNNs also have the potential for interpretability (Ying et al., 2019). GNNs have been applied to a variety of biomedical or disease diagnostic applications in recent years (e.g. Parisot et al., 2018;Sun et al., 2021) due to their innate ability to handle diverse and complex graphical data (Gori et al., 2005;Scarselli et al., 2009). By using a neighbourhood of nodes (representing human participants) that comprise various disease factors or variables, participants' similarities can be derived from the relationships that exist between each node (edge weights).
With respect to predicting AD, a variety of GNN approaches have been applied. In particular, they have been applied on different types of data (Parisot et al., 2018;Kazi et al., 2019a;Kazi et al., 2019b;Wen et al., 2020;Vivar et al., 2021;Zhu and Razavian, 2021) and even single-cell data . A limitation of such GNN based approaches was that of graph rigidity, where the final graph structure was limited when inducting new nodes. Such an example was provided in Parisot et al. (2018) where the resulting model required a new model iteration to be trained with each new node addition. More robust GNN techniques have been proposed (Zhu and Razavian, 2021;Song et al., 2021). In particular, in Song et al., (2021), a flexible GNN was proposed to solve the fixed graph structure problem by adopting a meta-learning strategy, specifically metric-learning, which was used to infer node similarity using a trainable similarity function, facilitating the use of heterogeneous data types, including magnetic resonance imaging (MRI) data.
Complementing the supervised learning of GNN, unsupervised learning such as dimensional reduction, clustering and data visualisation have been used for providing further insights into heterogeneous AD data (e.g. Alashwal et al., 2019;Katabathula et al., 2021). In particular, some studies using clustering methods had identified sub-populations that were relatively homogeneous based on clinical and biological features (Gamberger et al., 2016;Martí-Juan et al., 2019;Ferreira et al., 2019;Prakash et al., 2021, Alexander et al., 2020Alexander et al., 2021), differing rates of cognitive decline of sub-groups of prodromal AD patients (Gamberger et al., 2017;Alexander et al., 2020;Alexander et al., 2021), and for stratifying treatments (Prakash et al., 2021).
GNN studies, to date, have not made use of tau-specific positron emission tomography (PET) neuroimaging data, one of the key lesions in AD (Wong-Lin et al., 2020). Studies have suggested that tau PET brain images readily matched the distribution of tau deposits reported from histopathological studies, brain atrophy, hypometabolism and overall severity of AD (Saint-Aubert et al., 2017), and might even be better than amyloid PET and MRI in AD prognosis (e.g. Ossenkoppele et al. (2021)). In terms of the studies using clustering, none have used the identified clusters for GNN-based AD classification.
Moreover, given that clinician diagnosis of AD can be subjective and not always consistent, there has been no utilization of more objective data-driven labelling of classes using unsupervised ML. Particularly, from an unsupervised cluster-based viewpoint, although previous studies demonstrated benefits in identifying sub-groups of patients, it is unclear, in the presence of tau PET neuroimaging data: (i) whether there are clear-cut AD clusters and if not, whether it is plausible to re-label AD class(es) for subsequent employment of GNN; and (ii) whether the GNN based on the re-labelled classes can perform sufficiently well as compared to GNN trained on clinician diagnostic labels.
In this study, we first apply nonlinear dimensional reduction on a heterogeneous dataset, which includes tau PET neuroimaging data co-registered with MRI. This is followed by data clustering, and then investigate the feature characteristics of the individual clusters. Next, we re-label cases based on cluster information, validated by tau PET data, to form new classes of AD and non-AD classes for GNN's AD classification. Finally, the GNN's performance using the re-labelled data is compared with that using clinician diagnosis.

Data Description
The dataset used in this study was obtained from the open Alzheimer's Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu), particularly the ADNIMERGE-3 open repository. Only ADNI participants who had undergone MRI and tau PET scans (for detecting tau deposition) were extracted from the data, and the PET and MRI PET data were merged with sociodemographic, medical/family history and neuropsychological features as measured at study baseline. The final dataset comprised of 224 features: 7 sociodemographic and medical history features, 40 cognitive and functional assessments' (CFAs) scores, and 177 neuroimaging features (from combined MRI and tau PET imaging data; see below). Clinician diagnosis of participants, considered as a class label for training GNN model (see below) consisted of control normal (CN), Alzheimer's Disease (AD), and mild cognitive impairment (MCI -which includes prodromal stage of AD).
The sociodemographic and medical/family history features were selected from each participant's medical and sociodemographic profile. These features were age, gender and years of education, maternal and paternal family history of AD, number of copies of the APoE ε4 alleles (abbreviated as APoE4). Reprocessed using 1-hot encoding, the APoE4 feature can take a value of 0, 1 or 2, representing the number of copies of the APoE4 allele.  (Ossenkoppele et al., 2021), which has been recently approved by the U.S. Food and Drug Administration (FDA) (Jie et al., 2021).
In total, a dataset that comprised 559 samples representing individual participants was analysed, after data merging and pre-processing (see below). Table I shows a statistical summary of some sociodemographic and clinical data, with comparable average ages and years of education across the 3 diagnostic classes of CN, MCI and AD.

Data Preparation
Prior to data pre-processing, sociodemographic, medical/family history, CFAs and neuroimaging data were combined using each participant's unique identifier (ID), which ensured that every neuroimaging datapoint was merged with its respective participant profile. Inevitably, some participants were not present for a portion of the assessments without MRI and PET brain scans, which resulted in missing values. Although the dataset was well accounted for, outliers persisted as a direct result of those missing values. Relatively simple and sound imputation techniques were adopted in favour of more technical approaches as they tend to provide competitive performance with the absence of the computational and technical complexity (McCombe et al., 2022). Specifically, rows that contained sporadic missing values were imputed by means of single imputation technique (Ambler et al., 2007;Zhang, 2016), where: (i) categorical values were imputed using the mode of its respective feature; and (ii) continuous values were imputed using the mean of its respective feature.
However, participants missing a substantial portion of their cognitive assessment and medical/family history were imputed using a similarity matrix (Moeur and Stage, 1995;Boriah et al., 2008). It is important to note that these missing values pertained only to the sociodemographic portion of the feature set, affecting 6% of the entire dataset in total. This similarity matrix was calculated by taking the target participant with missing values and computing the distance between the target point and each remaining row. As the dataset was largely continuous, the Minkowski distance, in particular, the Frobenius norm (L2 norm), was employed to calculate the distance between each instance and the target row (distance being a function of the coordinate vectors for each participant) (Moeur and Stage, 1995;Boriah et al., 2008). Approximately 10 members that showed close resemblance to the target row were provided by the resulting matrix, ultimately yielding a participant of greatest similarity to use as a reference for imputation.

Left column: Clinically diagnosed control normal (CN), mild cognitive impairment (MCI -which includes prodromal stage of AD) and Alzheimer's Disease (AD).
. CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 6, 2022. ;https://doi.org/10.1101https://doi.org/10. /2022 Once all missing data imputation had been completed, the dataset was normalised. Negative columns were initially isolated and normalised by increasing all values in the column by the absolute of the minimum value in that column, as described by: for some variable or feature . Lastly, all features were scaled using min-max normalisation: . (2)

Tau PET and MRI Data Pre-processing
The pre-processing steps for the PET and MRI data are outlined as follows. Each PET scan was coregistered with its associated MRI scan before normalising to a predetermined AD template, whereby the resulting images were corrected for partial volume estimation (PVE) using the SPM toolbox PETPVE12 (Müller-Gärtner et al., 1992;Gonzalez-Escamilla et al., 2017). This reference serves as a template for dictating measures from the aforementioned MRI scans. The values were sampled from brain regions defined by the Desikan-Killiany atlas (Desikan et al., 2006) where the resulting labels were formatted using this custom template. Ultimately, this process yielded 177 neuroimaging features per participant based on their combined PET-MRI data. Figure 1 shows the distributions of tau levels for two sample brain regions, namely the (white-matter) right entorhinal cortex and the left amygdala for clinically diagnosed classes (CN and MCI combined, and AD), and all participants. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Unsupervised Learning, Feature Selection and Class Re-labelling
After data normalisation (see Section 2.2), the dataset of 224 features and 559 samples was subjected to dimension reduction using the uniform manifold approximation and project (UMAP) for dimension reduction and data visualisation in lower dimensional space (McInnes et al., 2018). Unsupervised manifold learning allows for efficient embedding of non-linear data points while maintaining the relative distance or local connectivity of those points with respect to one another. By using an exponential probability distribution, as opposed to Euclidean distances, UMAP offers flexibility in that any distance function can be applied.
For this study, a UMAP cluster was implemented with a large nearest neighbour parameter (30 neighbours) (5) to avoid focusing on very local structures -a minimal distance value (0) was set to improve cluster density. 5 dimensions of the UMAP space was selected, based on visual inspection of the compactness of AD diagnosis cases distributed along these dimensions (see Results). Then, 3dimensional UMAP plots were presented for data visualisation and clustering purposes. Each data point in each UMAP cluster was first assigned its original label (CN, MCI or AD) based on clinician diagnosis before having some of the data points re-labelled (see below).
Next, we used the unsupervised learning, k-means clustering (Lloyd, 1957;Forgy, 1965;Lloyd, 1982), to identify discrete clusters within the UMAP space. k-means clustering had also been applied successfully on AD datasets in previous studies (Martí-Juan et al., 2019;Alexander et al., 2020;Alexander et al., 2021), including the ADNI dataset. Characteristics of the clusters were extracted by using the Repeated Incremental Pruning to Produce Error Reduction (RIPPER) algorithm (Cohen, 1995) as implemented in the JRip package in RWeka (Hornik et al., 2020) to identify the feature characteristics associated with membership of a particular cluster. Additionally, feature selection by the information gain algorithm (Larose and Larose, 2014) was implemented with the FSelector package in R (Romanski et al., 2021) to identify the top 10 features most associated with membership of a particular cluster for the originally labelled data and later, the re-labelled data. Once the key feature characteristics of each cluster were identified, the clusters were each given an appropriate unique name.
One of the clusters based on UMAP and k-means processing was subsequently identified to uniquely consist of a majority of clinically diagnosed AD cases; hence named the "AD" cluster. However, there were a few clinically diagnosed CN and MCI cases in this cluster, and some clinically diagnosed AD cases outside of the cluster. We re-labelled the CN and MCI cases in Cluster AD as AD cases while re-labelling AD cases in the other clusters as MCI cases. These re-labelled cases were validated posthoc based on their tau PET levels, which were found to be similar, while being intermediate between the non-re-labelled AD and non-AD cases. The re-labelled data was used for training the GNN model, which will be compared to GNN trained using the original clinician diagnosis labels (see Section 2.5).

Graph Neural Network (GNN)
The robust meta-learning-based auto-metric graph neural network (AMGNN) classifier developed by Song et al., (2021) was used to classify the data using separately the original clinician diagnostic class labels and the new class labels. Song et al. (2021) used a few features of the data to create the graph for the neural network classifier, with the rest of the data features processed in the context of the graph relationships. In this work, the gender, age, education and family history data were used to build the graph for the classifier. The node classification of this small graph was done by randomly selecting samples from the training dataset as a meta-task to train the AMGNN. Based on several meta-task training runs, the AMGNN model can then be used to classify unknown label nodes in a new graph (Song et al., 2021). Overall, the network consisted of 2 GNN layers, alongside a total of 4 CNN layers.
. CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 6, 2022. ; https://doi.org/10.1101/2022.03.03.22271873 doi: medRxiv preprint The graph structure initialisation was the same as in Song et al. (2021) except that the data features and their number of features were different. Specifically, a fully connected graph was initialized to a given dataset such that = { , } where V represents the vertex set (participants), and E represents a set of pairs of vertices (graph edges). This was accompanied by the weights of the edge matrix, being assigned a value of 1. Each node % was represented by the following set of features (Song et al., 2021): where % defines the numeric classification of each label, η % represents the risk factors (e.g. age, gender, years of education), while % represents the CFA scores. Lastly, % represents the tau PET and MRI feature set. The relationship or edge that exists between the two nodes % and * was denoted by %* . The construction of an AMGNN layer, loss function definition, and training strategy were the same as in Song et al. (2021). For further details, please refer to Song et al. (2021).

GNN Training and Testing
Using 559 resulting nodes, a small graph was constructed using batches of 64 randomly selected samples. This was repeated for both the supervised clinician's labels and cluster-based class label revisions of the dataset. The total number of training iterations was 250. Small batches were used to reduce both graph and therefore the computational complexity; this allowed for many iterations to be executed quickly. The dataset was split into training and testing datasets, at 80% and 20%, respectively. The model was trained using data batches of 64 rows before updating the loss parameter. Model performance was tested using one-shot learning using a single training sample in each batch (Garcia and Bruna, 2018). This was repeated for both datasets; one with original clinical diagnosis labels and the other based on cluster-based re-labelling. Model comparison and 3-class classification accuracy performance were visualized throughout training and testing process using TensorBoard, a visualization framework. Finally, the experiment was repeated 10 times to check for robustness and variability in performance. The rate of convergence during training sessions was also assessed.

Distinctive Data Cluster Characteristics in Low Dimensions
Using the dimensional reduction UMAP method (see Section 2.4), the initial dataset of 224 features were projected to 5 UMAP dimensions (Figure 2A). In the figure, the classes CN (dark purple), MCI (light purple) and AD (yellow) labelled by clinician diagnosis are denoted in different colours. A 5dimensional UMAP space was selected, as visual inspection of the AD diagnosis information as distributed along the 5 dimensions demonstrated that the AD cases were clustered tightly together along some of these dimensions (Figure 2A, yellow points). A simpler 3-dimensional UMAP projection is illustrated in Figure 2B. Using k-means clustering, it can be observed that there were 5 distinct clusters (denoted by different colours). If we indicated the data points with clinical diagnosis (CN, MCI or AD), we found that the distribution of clinical diagnosis did not conform well to the 5 discrete clusters ( Table  2).
. CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. Next, we sought to understand the characteristics of each cluster. Here, we used an association rulebased classifier generated by the RIPPER algorithm (Section 2.4) to understand the association of the features with each cluster. Table 2 shows a summary of the sociodemographics and distribution of clinician diagnosis in each of the 5 clusters. In terms of years of education, all the clusters had similar average values around 16-17 years. The 2 most distinctive clusters were the gender specific clusters in which only male or female cases exist. This was consistent with previous unsupervised learning studies using different data (Gamberger et al., 2016;Alexander et al., 2021). Hence, we named these two clusters as Male and Female (Table 2). Interestingly, the Male cluster had the oldest average age among all clusters and had a higher proportion of MCI cases than the Female cluster. In comparison, there was a cluster with the youngest average age (71 years old) with about equal proportion of male and female participants, and which also had the highest proportion of CN cases. For now, we named this cluster the Young cluster. Another distinctive cluster is one with the largest proportion of AD cases (52%), despite not having the oldest average age. We named this cluster as the AD cluster even though MCI cases constituted 36%. The final clusters consisted of almost equally mixed gender and with a substantially high number of CN cases. For now, we named this the Catchall cluster.
. CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 6, 2022.  By successively applying the rules of the association rule-based classifiers (Section 2.4), it is possible to discern some other features that distinguish each cluster. Table 3 summarises the results for the 5 clusters. In total, there were 7 association rules found. First, the algorithm revealed that if there were 2 copies of the APoE ε4 allele, a known risk genetic factor (e.g. Angelopoulou et al., 2021), then 90% of the cases would belong to Cluster AD. We next apply the classifier on the remaining cases, and observed that if there was high tau level in the left fusiform cortex with no APoE ε4 allele, then the algorithm predicted about 94% of the data to again be in Cluster AD. The fusiform brain region is known to be involved in cognitive function, particularly facial recognition, and AD patients have issues with recognising faces (e.g. Huang et al., 2020Huang et al., , 2021; hence this was a consistent funding. Interestingly, demographics and medical/family history also played a role in cluster membership. After applying the first two rules, if the remaining participants' mother had AD but no APoE ε4 allele, then there was ~96% chance of being in Cluster Young; we renamed this cluster as Young-FH (FH: family history). Applying the classifier again uncovered that if the participants were female with no APoE4 ε4 allele, then about 99% of the time could be found in Cluster Female.
For the remaining cases, when there was low level of tau in the left-entorhinal white-matter with no APoE4 ε4 allele, there was 100% chance of being in Cluster Male. It should be noted that the entorhinal brain region is associated with early-stage AD (Braak and Braak, 1991, 1996, 1997a, 1997b. Finally, for the rest of the data, about 98% belonged to Cluster Catchall.
. CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 6, 2022.  We next further analysed the clusters' characteristics using a more extensive feature selection method -information gain of the data features with respect to cluster labels (Section 2.4). In particular, we hoped that this method would shed light on the Catchall cluster. The top ranked features found by information gain that could distinguish the clusters were (ranked from the most important feature): APoE4, gender, (history of) mother with AD, ctx.rh.inferiorparietal, ctx.lh.middletemporal, lh.Amygdala, ctx.lh.inferiorparietal, ctx.lh.lateraloccipital, wm.lh.entorhinal, and wm.lh.inferiortemporal. Consistent with the association rule mining results, APoE4, gender and (history) of mother with AD were identified as the top 3 features. Moreover, the list was dominated by various tau PET-MRI imaging features (7 out of 10 features). Hence, we analysed the clusters using the ctx.lh.inferiorparietal (left inferior parietal cortex) and lh.Amygdala (left amygdala) features, based on their suggested links to early stages of MCI and AD (Greene et al., 2010;Coupé et al., 2019). Table 4 summarises the statistics of participants of the values for the ctx.lh.inferiorparietal and lh.Amygdala features across the 5 clusters. We can see that Cluster AD consistently had the largest values for all the statistics regardless for both brain regions ( Table 2, bold text), as was expected. Interestingly, Cluster Catchall had the second largest values for most of the statistics ( Table 2, bold text). These could be due to the 12% of clinically diagnosed AD cases (second highest cases among all the clusters), albeit 69% CN and 19% MCI cases ( Table 2). Hence, this cluster might consist of early MCI or AD stage, with or without formal diagnosis by clinicians -i.e. with potential neurological risk factor (NRF). We therefore renamed this cluster as Cluster Catchall-NRF.
. CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 6, 2022. ; https://doi.org/10.1101/2022.03.03.22271873 doi: medRxiv preprint TABLE 4. Summary of the statistics of participants' tau PET-MRI neuroimaging features, ctx.inferiorparietal and lh.Amygdala, with respect to the 5 clusters. Values shown up to 3 decimal places. Bold: Prominently high values across each column and each brain region.

Re-labelling of Classes
When clinician diagnosis (CN, MCI or AD) was indicated in these clusters (Figure 2C), we observed overlaps between the UMAP clusters and the clinician diagnostic labelled classes, with substantial mixing between the classes in some of the clusters. As mentioned earlier, one of the clusters (Cluster AD) particularly had substantially purer (AD) cases. Hence, we hypothesised that we could re-label the CN and MCI cases in Cluster AD as AD cases (CN/MCI-to-AD), while re-labelling AD cases in . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 6, 2022. ;https://doi.org/10.1101https://doi.org/10. /2022 the other clusters as MCI cases (AD-to-MCI). The re-labelled data points are visually shown in Figure  2D (compared to Figure 2C).
To validate this, we investigated post-hoc the tau PET-MRI features of these re-labelled cases to check whether they could together be considered as a form intermediate AD stage with difficulty of diagnosis. Before doing that, we again used information gain to identify the top ranked features with respect to cluster labels of the re-labelled data, and we found the following features (ranked from the most to least important): CFA harmonised composite memory (PHC_MEM), ctx.lh.entorhinal, another composite memory scale ADNI_MEM, wm.lh.entorhinal, lh.Amygdala, wm.lh.inferiortemporal, rh.Amygdala, wm.rh.inferiortemporal, ctx.lh.inferiortemporal and wm.lh.fusiform. Compared to the original dataset (see Section 3.1. above), only lh.Amygdala and wm.lh.inferiortemporal features were common.
Next, based on the selected features we made use of PHC_MEM, lh.Amygdala and and wm.lh.entorhinal feature to perform a post-hoc check for the re-labelled cases, given the known link of the amygdala to early-stage AD (Dickerson et al., 2001;Killiany et al., 2002;Squire et al., 2004;Coupé et al., 2019). Table 5 shows the mean and standard deviation for these 2 features for the 4 cases: (i) relabelled CN/MCI-to-AD in Cluster AD; (ii) AD-to-MCI outside Cluster AD; (iii) remained AD cases in Cluster AD; and (iv) remained non-AD cases outside Cluster AD. We can see that for the ADto-MCI (outside Cluster AD) and CN/MCI-to-AD (in Cluster AD) re-labelled cases, the values for both the lh.Amygdala and wm.lh.entorhinal features' values were relatively similar, while lying intermediate between the values of the remained AD cases and remained non-AD cases. This neuroimaging features supported the re-labelling hypothesis regarding the clinician diagnostic uncertainty. With regards to the CFA, PHC_MEM, the pattern was not as clean. Specifically, the values for the AD-to-MCI (outside Cluster AD) and CN/MCI-to-AD (in Cluster AD) re-labelled cases were not too close to each other, albeit both have mean values to lie between the values for the remained AD and remained non-AD cases. In fact, the AD-to-MCI cases seemed to have values leaning closer towards that of the remained AD cases.
. CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 6, 2022. ; https://doi.org/10.1101/2022.03.03.22271873 doi: medRxiv preprint TABLE 5. Summary of statistics of participants' neuroimaging features lh.Amygdala and wm.lh.entorhinal with respect to the relabelled CN/MCI-to-AD in Cluster AD, and AD-to-MCI outside Cluster AD (bold text), and compared them to those of remained AD cases in Cluster AD and remained non-AD cases outside Cluster AD; hence 4 cases to consider. Values shown up to 3 decimal places.

GNN based Training and Classification of CN, MCI and AD
With the re-labelled data, we can now train the GNN model by Song et al. (2021) (Sections 2.5-2.6) using both the originally labelled data based on clinician diagnosis and the re-labelled data (in Section 3.2). We then check for the trends of convergence for the GNN models across all 30 experimental repetitions. We found that the trend of convergence for the GNN model using the re-labelled data to be faster than when using the originally labelled data. The convergence trends for the first 10 repetitions of each experiment are shown in Figure 3. Specifically, the GNN model using the re-labelled data reached a 3-class classification accuracy of ~90% by the ~80 th iteration while the GNN model using the originally labelled data took ~120 iterations to reach the same accuracy level. Over the 30 repetitions, the GNN model using the re-labelled data achieved an average accuracy of 93.20±0.03%, which should be compared to that using the originally labelled data of 90.06±0.04% accuracy. A t-test comparing accuracy over the 30 repetitions yielded a p-value of 0.011. Hence, not only was the clusterlabelled based model's average accuracy significantly higher, but its performance was less variable. This could be related to the original classification of the labels, with perhaps the labels following a less predictable pattern. . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

DISCUSSION
This study has successfully demonstrated the application of a GNN model on heterogenous dementia data types using both clinician diagnostic labels and cluster-based labels. The heterogeneous data consisted of combined tau PET neuroimaging and MRI data, together with sociodemographic data, medical/family history and cognitive and functional assessments. The GNN model using the re-labelled data converged faster with higher stability during training, while achieving significantly higher (3class) classification accuracy than that using the original clinician diagnostic labels.
GNNs have been introduced into AD classification studies in a variety of ways using different types of data, especially MRI and PET brain data (Parisot et al., 2018;Kazi et al., 2019a;Kazi et al., 2019b;Wen et al., 2020;Vivar et al., 2021;Zhu and Razavian, 2021;Yang et al., 2021;Bessadok et al., 2021). More recent advancements of GNNs on AD have been proposed to provide more flexibility (Zhu and Razavian, 2021;Song et al., 2021). However, these studies have not considered detailed tau PET neuroimaging data, a limitation given the closer alignment of tau PET with AD stages (Saint-Aubert et al., 2017;Ossenkoppele et al., 2021), than e.g. amyloid PET, and the recent approval of its use by the U.S. FDA (Jie et al., 2021). Importantly, these studies, together with the literature on unsupervised learning approaches applied to AD (Gamberger et al., 2016;Gamberger et al., 2017;Martí-Juan et al., 2019;Ferreira et al., 2019;Alashwal et al., 2019; . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 6, 2022. ;https://doi.org/10.1101/2022.03.03.22271873 doi: medRxiv preprint Katabathula et al., 2021Prakash et al., 2021, Alexander et al., 2020Alexander et al., 2021;Prakash et al., 2021), have not used cluster-based class re-labelling for GNN's classification of AD. Here, we have made use of the robust auto-metric GNN model by Song et al. (2021). Unlike Song et al. (2021) who used a more limited (TADPOLE) dataset, this work made use of the more extensive ADNIMERGE-3 dataset with individuals that had tau PET brain scans. Moreover, our work included more cognitive and functional assessments. Importantly, we made use of UMAP-cluster based relabelling to train the GNN model.
Prior to applying GNN for AD classification, we made use of nonlinear dimensional reduction UMAP (Figure 2) and projected the data into a lower 3-dimensional UMAP (Figures 2B-D) space for data visualisation for deeper insights and for guidance in the re-labelling ( Figure 2D). Five discrete data clusters were identified using k-means clustering ( Figure 2B). Applying association rule mining successively to the data, we found key feature characteristics (Tables 2 and 3) uniquely defining four of these five clusters, namely, a majority AD cluster, a fully male cluster, a fully female cluster, a cluster of younger participants with parental history of AD ("Young-FH" cluster), and a relatively unknown "Catchall" cluster. It is interesting to speculate that the Young-FH could be due to relatively younger participants who were concerned about their own health given that their parents had a history of AD. The 2 gender specific clusters identified seemed to be in line with previous studies (Gamberger et al., 2016;Prakash et al., 2021, Alexander et al., 2020Alexander et al., 2021). Cluster AD membership was associated with having 2 copies of APoE ε4 allele or had high tau level in the left fusiform cortex, consistent with genetic risk factor (e.g. Angelopoulou et al., 2021) and late-stage AD brain changes (e.g. Huang et al. 2020Huang et al. , 2021, respectively. The substantially higher number of AD cases in this cluster provides hope that the unsupervised learning's clustering approach might reveal something interesting about the disease while helping to guide the subsequent re-labelling of diagnostic classes. Further inspection using information gain feature selection identified the top 10 most important features identified within the 5 clusters. Other than the top 3 features (APoE4, gender and parental history of AD), the remainder were tau PET-MRI imaging features. Based on their suggested links to early stages of MCI and AD (Greene et al., 2010;Coupé et al., 2019), we selected 2 of these tau PET-MRI features, the ctx.lh.inferiorparietal (left inferior parietal cortex) and lh.Amygdala (left amygdala) features, to provide further insights into the clusters. These regions were chosen due to their suggested links to early-stage MCI/AD (Greene et al., 2010;Coupé et al., 2019). It turned out that the previously named Catchall cluster had levels of tau between those of the AD cluster and the 3 other clusters ( Table  4). This might indicate that this cluster could constitute cases with underlying neurological risk factors (NRF), which we later renamed as the Catchall-NRF cluster.
Next, we re-labelled the diagnostic classes based around the AD cluster such that AD cases occurred only within this cluster while those AD cases outside of it were re-labelled as a prodromal stage of AD. Specifically, we re-labelled the CN and MCI cases in Cluster AD as AD cases (CN/MCI-to-AD), while re-labelling AD cases in the other clusters as MCI cases (AD-to-MCI). This process was supported by post-hoc analysis of 2 tau PET-MRI features (lh.Amygdala and wm.lh.entorhinal) and 1 CFA feature (PHC_MEM), obtained from information gain feature selection of the re-labelled data, revealing similar brain tau PET levels for the CN/MCI-to-AD and AD-to-MCI cases, and lying between those of the remained AD and non-AD cases. These 2 brain regions were based on their suggested links to early-stage AD (Dickerson et al., 2001;Killiany et al., 2002;Squire et al., 2004;Coupé et al., 2019). These re-labelled data led to more accurate and stable GNN model for detecting CN, MCI and AD groups. In comparison, although the PHC_MEM although follow similar trend, the latter was not as clear-cut. This showed that CFAs might not be as well related to the clusters. This was quite expected . CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

(which was not certified by peer review)
The copyright holder for this preprint this version posted March 6, 2022. ; https://doi.org/10.1101/2022.03.03.22271873 doi: medRxiv preprint as the overall features leading to the clusters were dominated by tau PET-MRI neuroimaging features, and that neuroimaging features were more objective than CFAs.
Taken together, we have successfully incorporated both unsupervised learning (UMAP, k-means clustering, association rule mining) and supervised learning (information gain feature selection) to provide insights into tau PET-MRI and other non-neuroimaging heterogeneous data, and subsequently using supervised learning (GNN) for detecting CN/MCI/AD. Importantly, combined UMAP and clustering highlighted new facets in terms of features that can be used for AD diagnosis -in an attempt to deconstruct the current standards for determining features actively contributing to AD. Ultimately, using clustering analysis provides more emphasis in exploring unknown AD contributors as opposed to accepting the current status-quo. It should, however, be noted that the re-labelling of the data required not just algorithmic approaches, but also information from clinical diagnosis of AD. Indeed, we were fortunate to identify a data cluster to comprise mainly of AD cases, even though the MCI cases were rather more distributed. Future work should investigate situations where diagnostic cases were more distributed, and to distinguish the subtypes of MCI and AD cases and their progression (Gamberger et al., 2017;Ferreira et al., 2017Ferreira et al., , 2019Ferreira et al., , 2020Alexander et al., 2020;Alexander et al., 2021;Mitelpunkt et al., 2020;Vogel et al., 2021).
The granularity of the data used to train the GNN model could be expanded to derive new patterns governing AD development. For instance, this could involve inclusion of blood-or cerebrospinal-based biomarkers, or with a larger number of PET scan participants in the ADNI data it could include the breaking down of additional cognitive and functional assessments into their smaller components, as specific questions in a neuropsychological questionnaire (McCombe et al., 2021). In fact, it would be worth investigating using larger dataset with PET-MRI features to validate our clustering and ADclassification findings. As the ADNI study continues it may also become possible to look at the PET-MRI data longitudinally, for example, to ascertain whether members of the NRF cluster are at increased risk of cognitive decline. It would also be of great value to explore the decision-making processes of GNNs in more depth. For example, Wen et al. (2020) has provided extensive discussion in the ways of GNN modelling and feature interpretation -processes which could be adapted to the GNN modelling techniques utilised in this study to further improve the interpretability and accessibility of ML research in computer-aided diagnostic systems. Further opportunities exist in the use of MRI/PET data analysis within AD diagnosis. In our study, neuroimaging data was processed using PETPVE12 according to the Desikan-Killiany atlas (Desikan et al., 2006), however, the neuroimaging data could be processed in a variety of atlas standards which may permit the GNN model to provide different interpretations of the same underlying data, or perhaps even improve its diagnostic accuracy.

CONCLUSION
The aim of this study was to demonstrate the benefits of cluster-based labelling for graph neural network's multi-class classification of AD, using tau PET imaging together with other heterogeneous data. Original classes based on clinician diagnosis were used and cluster-based re-labelled classes were devised using latent features derived from UMAP and various analytical methods. The high diagnostic accuracy of the latter approach highlighted the potential of how the labelling of an individual having MCI or AD can be disrupted via less subjective or biased measures. This study reinforces the need for current AD diagnostic practices to continue to be re-evaluated by using more objective data-driven, e.g. unsupervised learning and clustering methods, to derive new patterns and sub-groups from existing datasets.
. CC-BY-ND 4.0 International license It is made available under a is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. (which was not certified by peer review) The copyright holder for this preprint this version posted March 6, 2022. ; https://doi.org/10.1101/2022.03.03.22271873 doi: medRxiv preprint

DATA AVAILABILITY STATEMENT
Python and R codes for clustering, data analysis and classification are available from the authors upon reasonable request. Requests to access the original datasets should be directed to ADNI (http://adni.loni.usc.edu/).

AUTHOR CONTRIBUTIONS
NM, JB and KW-L conceptualized this study. JMS-B processed the combined PET-MRI data. NM and JB developed the codes for dimensional reduction, and NM developed the codes for clustering and feature selection. JB developed the codes for graph neural network modelling, and NM and JB performed the analyses. NM, JB and KW-L interpreted the results and wrote the bulk of the document. DPF and PLM provided comments and insight into the analyses, and contributed to the editing of the manuscript. All authors contributed to the article and approved the submitted version.

FUNDING
This work was supported by the European Union's INTERREG VA Programme, managed by the Special EU Programmes Body (SEUPB; Centre for Personalised Medicine, IVA 5036). The views and opinions expressed in this paper do not necessarily reflect those of the European Commission or the Special EU Programmes Body (SEUPB). The views and opinions expressed in this paper do not necessarily reflect those of the European Commission or the Special EU Programmes Body (SEUPB).

ACKNOWLEDGMENTS
We thank Alok Joshi for providing useful feedback on the work and an earlier version of the manuscript. Data collection and sharing for this project was funded by the Alzheimer's Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer's Association; Alzheimer's Drug