Retinal texture biomarkers may help to discriminate between Alzheimer’s, Parkinson’s, and healthy controls

A top priority in biomarker development for Alzheimer’s disease (AD) and Parkinson’s disease (PD) is the focus on early diagnosis, where the use of the retina is a promising avenue of research. We computed fundus images from optical coherence tomography (OCT) data and analysed the structural arrangement of the retinal tissue using texture metrics. We built clinical class classification models to distinguish between healthy controls (HC), AD, and PD, using machine learning (support vector machines). Median sensitivity is 88.7%, 79.5% and 77.8%, for HC, AD, and PD eyes, respectively. When the same subject has the same classification for both eyes, 94.4% (median) of the classifications are correct. A significant amount of information discriminating between multiple neurodegenerative states is conveyed by OCT imaging of the human retina, even when differences in thickness are not yet present. This technique may allow for simultaneously diagnosing Alzheimer’s and Parkinson’s diseases.


Introduction
Diagnosis of both Alzheimer's disease (AD) and Parkinson's disease (PD) remains a major challenge. Simple diagnostic tests serving as biomarkers for tracking the onset and progression of both diseases would be extremely valuable.
AD affects 5 to 7% of people over the age of sixty [1], and a total of 5.4 million patients in the United States [2]. For PD, the incidence has been reported to be 4.5-19 per 100 000 population, per year [3]; while its age-adjusted prevalence is estimated to range between a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 72 and 259 per 100 000. In the US, the direct costs of health and social care associated with dementia patients have been estimated to reach USD 100 billion per year [3]. Similarly, for PD patients the total annual cost is more than the double of that of the control population [3].
Even though epidemiological evidence suggests a potential reduction in dementia incidence in recent years, its prevalence, according to Shah and co-workers [4], is expected to increase in the coming decades due to population ageing. The impact of dementia in patients and the overall society was acknowledged in the First WHO Ministerial Conference on Global Action Against Dementia (March 2015), bringing together, among others, representatives from 80 WHO Member States and four UN agencies [4]. On the top priorities in the domain of diagnosis, one can find the research aiming to promote timely and accurate diagnosis of dementia through the development and validation of simple riskless biomarkers that could even be used in prodromal disease stages. Moreover, the focus on the need to ". . . identify similarities and differences between diseases and dementia subtypes, and assess progression from premanifest (presymptomatic) to late-stage diseases" [4] is stressed in the report. Furthermore, priorities point to the need to diagnose dementia in the primary care practices, thus putting aside the focus on the diagnosis based on brain imaging requiring instrumentation available only at advanced imaging centres, where the access is only possible to a fraction of the populationnot to mention the associated costs.
The morphological effects of the neurodegenerative process seem to be similar in the retina and the brain [5]. Therefore, the use of the retina as a window into the brain [6][7][8] might provide a simple solution for the establishment of reliable image-based biomarkers for neurodegeneration. This concept has been extensively exploited, mostly regarding the onset and progression of AD [9,10] and PD [11,12]. Notably, there is published evidence that the retinal structure might be altered in the course of these two neurodegenerative disorders [13][14][15][16].
When compared to brain imaging, the easy access to and lower operational cost of the ophthalmological imaging techniques are definite advantages, bringing the diagnosis closer to the primary care practices. Even though optical coherence tomography (OCT) has been extensively used in assessing neurodegenerative disorders, the hallmark has been the use of thickness measurements, either at the individual retinal layer level or in an aggregated fashion. While the thinning of specific layers has been the standard finding for both disorders, some contradictory findings can be found in the literature [9,[17][18][19][20][21]. Particularly in the case of AD, it has recently been reported that the thickness of some of the inner retinal layers undergoes dynamic changes as the disease progresses [22]. Moreover, with the retinal nerve fibre layer (RNFL) and retinal ganglion cell axons being the top candidates for the occurrence of neurodegeneration, it should be kept in mind that the thinning of this layer may occur for other types of dementia beyond AD or PD, such as Lewy body dementia [8].
Recent research [23][24][25][26][27][28] has pointed towards texture analysis as a promising tool in the study of biomarkers for neurodegenerative diseases. The term "texture analysis" encompasses a wide range of methods that allow for the characterisation of the underlying image patterns [29][30][31]. Such methods, when applied to images of the human retina, can help identify changes in the structural arrangement of the retina or that of specific retinal layers, both in health and disease.
In the present study, we address these issues by imaging the ocular fundus by OCT of healthy controls (HC), as well as that of patients diagnosed with AD and PD. Using a single age-matched control group and two groups of patients, we were able to develop a classification system based on texture characteristics of fundus images computed from OCT data. This system achieved a significant triple clinical classification accuracy.

Participants
OCT data from 20 patients diagnosed with AD, 28 patients diagnosed with PD and 27 agematched HC were gathered from the authors' institutional OCT database. The two studies (AD and PD) from which data were collected were approved by the Ethics Committee of the Faculty of Medicine of the University of Coimbra and were conducted according to the principles stated in the Declaration of Helsinki [32]. Written informed consents were obtained from all participants. Before the inclusion in this study, the AD patients underwent a thorough neuropsychological evaluation process, administered by experienced neurologists. As all these patients scored a value of 1 in the clinical dementia rating (CDR) scale, they were deemed capable of signing the written informed consent themselves. The AD and PD groups were recruited respectively at the Dementia and the Movement Disorders Units of the Neurological Department of the Centro Hospitalar e Universitário de Coimbra, where they were assessed by experienced neurologists. All participants had been previously imaged using the same OCT device and acquisition protocol on both eyes. Due to the low quality of the signal, four eyes were rejected: two from the AD group and two from the PD group.
For the diagnosis of AD, the standard criteria [33] was used, and patients were extensively characterised using formal neuropsychological evaluation as well as specific cognitive and functional staging scales, including the Montreal cognitive assessment (MoCA) [34,35]please refer to S1 Table for the MoCA scores. Patients were further investigated with obligatory laboratory, imaging studies and cerebrospinal fluid (CSF) analysis to exclude other forms of reversible dementia or systemic diseases. According to the main aims of the study, patients were strictly selected with a probable diagnosis of AD supported by positive CSF biomarkers (amyloid-β42, total tau and phosphorylated tau) and positive Pittsburgh Compound-B (PiB) and positron emission tomography (PET).
Further details on the methods used for the collection and analysis of CSF and PiB-PET data can be found in [36].
All patients had recently converted to AD from a prior stage of mild cognitive impairment. They were all in mild stage (CDR score = 1), with diagnosis duration ranging from zero to two years (see S1 Table for more detail) and were in a stable condition. Retinal injuries and retinopathies, optic neuropathies secondary to other factors (e.g. glaucoma, diabetes, age-related macular degeneration) and severe visual impairment were exclusion criteria.
The PD patients were assessed using the MoCA, the unified Parkinson's disease rating scale (UPDRS), and Hoehn and Yahr staging (test scores are available in S2 Table). These patients were diagnosed by a movement disorder neurologist, according to the criteria defined by the UK Parkinsons's Disease Society Brain Bank [37]. Only patients with ages ranging between 40 and 85 years old were included in the study. Subjects showing signs of advanced dementia, severe depression and history of substance abuse were excluded. Furthermore, a neuro-ophthalmologist assessed the patients and those with intrinsic optic nerve or macula pathology were excluded.
All subjects in the present study were selected towards the best age-match between groups and a balanced distribution by gender within each group. At the time of OCT scan, the PD patients had a diagnosis duration that ranged from one to 19 years (Q1/median/Q3: 2.75/5.00/ 10.00)-please refer to S2 Table for more detail. These patients' disease duration is modestly correlated (R 2 = 0.035) to their age (S1 Fig). Demographic data for all subjects in the current study are presented in Table 1.

Imaging protocol
Retinal imaging was performed using the Cirrus SD-OCT 5000 (Carl Zeiss, Meditec, Dublin, CA, USA) and the 512x128 Macular Cube protocol, centred on the macula.

Data pre-processing
The gathered volumetric data were further processed using the OCT Explorer software (Retinal Image Analysis Lab, Iowa Institute for Biomedical Imaging, Iowa City, IA, USA) to segment eleven retinal layers: the RNFL, ganglion cell layer (GCL), inner plexiform layer (IPL), inner nuclear layer (INL), outer plexiform layer (OPL), outer nuclear layer (ONL), inner segment/outer segment junction, outer segment, outer photoreceptor, subretinal virtual space, and retinal pigment epithelium (RPE). These segmentations were visually examined and manually corrected where necessary.
From each of the six inner retinal layers, the RNFL, the GCL, the IPL, the INL, the OPL, and the ONL, a mean value fundus (MVF) image [38] was computed as the average of the A-scan values between the two retinal layer interfaces defining the layer at study. These six images per eye constitute the data to be further processed and analysed throughout this work. All left eyes were horizontally flipped to match the right ones and to allow metrics to keep the same relative position.

Retinal thickness
The macular thickness was computed for each of the six retinal layers at study in this work, along with the full retinal thickness, i.e. the thickness from the inner limiting membrane to the top of the RPE.

Texture analysis
There are several methods for retrieving textural features from an image. Frequently, statistical methods such as the gray level co-occurrence matrix (GLCM) [39,40] are employed. These methods provide insight into the patterns and relative distribution of the image's intensity values. In the particular case of the GLCM, localised grey-level variations are recognised throughout different directions, and scales, examining pixel pairs iteratively. Alternatively, spectral texture features can be computed. By evaluating spatial frequencies at multiple scales, the wavelet transform has a natural application in texture analysis, where it can help to identify general/coarse texture features that statistical approaches like the GLCM fail to capture [29]. For this reason, wavelet-based texture descriptors have been computed and used in the classification of different types of medical images [41][42][43][44][45]. Both GLCM and wavelet-based texture features were used in this work to capture, respectively, local and global texture information. Regarding the GLCM, each image was first down-sampled to 128x128 pixels to obtain isotropic sampling in the horizontal and vertical directions and converted to 16 grey-level to limit the size of the GLCM matrix. These images were then split into 7x7 blocks to be independently analysed. The supremum over the different directions of each of the 20 metrics computed from the GLCM, for each block, was chosen as the respective value for the block. This information was later aggregated per quadrant, the superior-temporal, the superior-nasal, the inferiortemporal and the inferior-nasal quadrants. Each quadrant metric is the average of that of the respective 3x3 blocks, leaving out the row and column passing through the fovea (Fig 2). As such, a total of 80 GLCM-based features characterise each layer of the retina.
Concerning spectral texture features, the dual-tree complex wavelet transform (DTCWT) [46] was applied to each of the fundus images. The variance and entropy of the magnitude of the DTCWT complex coefficients were computed for all image subbands, following the approach used previously [47,48]. Exploratory analysis revealed that the variance alone, when computed for the six directionally selective subbands (±15˚, ±45˚and ±75˚) at decomposition level one, leads to the best performance. These six directional variance values were therefore combined with the 80 GLCM-based features to form the final feature vector, composed of a total of 86 features per layer.

Classification
The selected method for data classification was the support vector machine (SVM). SVMbased classification [49,50] is a two-step procedure where, in the first step, SVMs learn the differences between two groups to establish a classification model (training phase). In the second step, this model is used to classify cases not used during the first step. The validity of the model can be assessed (test phase) by using known cases in the second-step and comparing the predicted classifications to the real ones.
Prior to the classification, all features were independently z-score transformed to eliminate differences in scale that may interfere with the system's performance.
In total, 18 independent binary SVM models (with the radial basis function kernel) were developed, one per each of the six retinal layers at study and between any two of the three possible groups at study: the HC, AD, and PD groups.
Six features were considered for each of the 18 models to achieve the best classification performance while avoiding overtraining, considering the number of cases (eyes) in each of the groups (classes). The identification of the particular set of features for each of the models was performed by resorting to a recursive forward selection procedure. In this approach, a single feature is tested at a time, and the one allowing for the best performance is selected. After that, at a time, each one of the remaining features is considered, and the performance based on the two features (the feature currently being considered, plus the previously selected "best" one) is assessed. The set presenting the best performance is kept, and the process repeats until the best set with the predefined number of features is found.
The performance of each of the classification processes was assessed based on the k-fold cross-validation. In this procedure, the dataset at hand is split into k sets. The system is then trained with k-1 sets and tested on the set left out from the training. The testing consists of predicting the classes where the cases left out from the training would belong to, based on the model developed during the training phase, and comparing the predictions with the known (real) classification. This procedure is repeated k times.
For the classification of an eye in one of the three possible classes, a one-versus-one approach was used, along with a voting scheme. As such, for each of the six retinal layers at study, each eye was tested against the model between the control and AD classes, between the control and PD classes and between the AD and PD classes. For each of these three tests, the eye received a vote in one of the two classes under evaluation.
At the end of this procedure, the eye was classified into the most voted class. If all three classes had received the same number of votes, a tie was considered, and the eye was classified as "Unknown". If two classes received the same number of votes, the final decision was made by considering the result of the classification between those two classes. A diagram illustrating the classification process is shown in S2 Fig.

Statistical analysis
Based on the classifications of all eyes, using the classification process described above, a 3x4 confusion matrix was built, where the three rows represent the real (known) classes, and the four columns represent the classes according to the data and classification models. The extra "Unknown" class is composed of the cases with ties on the classification.
From the confusion matrix, the sensitivity and specificity were computed for each class: the control, the AD, and PD classes. The system accuracy was also computed.
Additionally, the percentage of participants with both eyes receiving the same classification was determined, along with the percentage of these that were correctly classified.
The last computed parameter was the percentage of eyes in the "Unknown" class, that is, the percentage of eyes where, from the texture point of view, no substantial differences were found between the HC, AD, and PD study populations.
All results were computed using Matlab R2017b (The MathWorks Inc., Natick, MA, USA) running on a personal computer. Table 2 presents the results of the full retinal thickness measurement distributions for the three groups. Thickness values are the average thickness within nine regions centred on the fovea. The central region is the circular area of 1000 micrometres in diameter, inner macular areas are within the radii 1000 and 1500 micrometres, and outer macular areas are within the radii 1500 and 3000 micrometres.

Results
For each of the nine areas specified in Table 2, a conventional one-way ANOVA was performed, in order to verify whether or not the thickness data alone can differentiate between the three groups under study. No significant main effects were found between the three groups, for any of the regions.
Regarding the classification, validation was carried out resorting to the k-fold cross-validation. k values of two, five and ten were used, where k = 2 represents the most conservative approach, and k = 5 and k = 10, the figures traditionally used, for comparison purposes only.
When the RNFL is ignored, the present analysis yields the best results. Therefore, presented results are based on the five of the six innermost layers, RNFL excluded-namely the GCL, IPL, INL, OPL and ONL layers.
Because the data partition (data split into training and testing sets) is random, and therefore distinct at each run, results are presented as distributions of performance indicators from 100 consecutive runs (Table 3).
Only four of the features used in the classification process showed a strong correlation with the thickness of the respective layer and quadrant, with the correlation coefficient ranging from 0.70 to 0.73. Another 17 features showed a moderate correlation (greater than or equal to 0.50). None of the features presents a strong correlation in more than one of the three groups, and only two present a strong correlation in one group and a moderate correlation in another group. As a result, it is possible to state that texture features under analysis convey information Table 2. Full-retinal thickness. Sectors mimic the ETDRS thickness map. Inner macular areas are within the radii 500 and 1500 micrometres, and outer macular areas are within the radii 1500 and 3000 micrometres.   on differences present in the retina that are not conveyed by thickness. For further information on the correlation values between these 21 features and thickness, refer to S3 Table. The results obtained, even when considering the most conservative scenario (k = 2), clearly indicate that these biomarkers are useful in classifying cases into one of the three considered groups. Moreover, such biomarkers allow distinguishing between AD and PD eyes and patients. Additionally, these results state that, in just 1 to 2% of the cases, no differences seem to exist between the health status and either of the two neurodegenerative disorders considered in this work, nor between the two diseases themselves. Of particular importance is the case when the two eyes of the same subject received the same classification. In these conditions, 82% to 96% of the cases are correctly classified solely based on the analysis of the texture of the retina, as imaged by OCT.

Healthy Controls Alzheimer's Disease Parkinson's Disease p-value (ANOVA)
Leaving out the RNFL, a total of 15 classification models were employed (three classification tests for each of the five remaining inner layers). While global texture metrics were used in 10 of these 15 models (66.7%), they represent only 16.7% of the total used features.
The analysis of the regional origin of the features that led to the above classification results shows an uneven contribution of the macular quadrants of the human retina. For the discrimination between the healthy control group and the AD group, over 65% (17 out of 26) features come from the superior macular region, seven from the superior-temporal quadrant and ten from the superior-nasal quadrant. Six and three features come from, respectively, the inferiortemporal and inferior-nasal quadrants. Additional four features come from the global texture.
Concerning the total number of features discriminating between the healthy control group and the PD group, there is an equal contribution from the superior and inferior regions of the retina. Similarly, there is an equal contribution from the temporal and nasal regions. Nevertheless, the superior-nasal and inferior-temporal macular regions contribute with only three features each (3/23), while the superior-temporal and inferior-nasal macular regions contribute, respectively, with ten and seven features. Moreover, seven features come from the global texture, almost twice that of the AD group.

Discussion
Our results indicate that SVM, a supervised machine learning method, may aid in the concomitant clinical diagnosis of AD and PD, even in the absence of univariate differences on average thickness. The technique herewith presented demonstrates to be able not only in discriminating between eyes of HC and patients but also in distinguishing between the eyes of AD and PD patients. Furthermore, this method is based on the non-invasive imaging of the ocular fundus by OCT, a widely available technique, nowadays standard in private clinical practice. Particularly encouraging in terms of the reliability and reproducibility of the approach is the fact that the vast majority of people receiving the same classification on both eyes do, in fact, have the correct classification, with median percentages of 91.4 (2-fold cross-validation) up to 94.4 (10-fold cross-validation).
In this work, we used non-linear (radial basis function kernel) SVMs. They allow for the identification of relevant features that distinguish between any two of the three groups, and for the use of these features to classify cases into one of the three possible groups.
The six wavelet-based parameters used were computed from the six directionally-selective detail subband images, which were isolated from the lower-frequency (i.e. less detailed) image information at the first level of the wavelet decomposition process. Note that each of these subbands captures the original image's details oriented at one out of six spatial orientations (±15˚, ±45˚, ±75˚). As these parameters correspond to the variance of the magnitude of the complex wavelet coefficients of the respective subband, they represent a measure of the spread of the grey-level distribution of that same subband, and ultimately reflect the contrast of the image's texture.
Global texture metrics were used in 66.7% of the classification models suggesting that these differences are spread over the entire macular region. Nevertheless, they represent only 16.7% of the total used features, which reinforces the need for local analysis to detect actual differences between health status and neurodegenerative diseases.
In the present work, the methodological research made on OCT data is far from its classical use, which may explain its discrimination power. While most of the reported works in the field rely on thickness measurements, texture data (from images computed per each retinal layer) were the metrics under evaluation here. In other words, our approach captures multivariate information, which is just not possible with simple thickness-based univariate comparisons. The inherent limitation of thickness-based approaches is likely a contributing factor to the inconsistent findings reported for both AD and PD. While several studies do not find thickness differences between HC and AD or PD patients [17,19,22], others report significant thinning of different retinal layers [9,18,20,21]. To add to this inconclusiveness, when thickness differences between each of these two neurodegenerative disorders and the healthy condition are indeed reported, they typically concern retinal thickness measurements performed on patients of either disorder's course at stages other than the initial stage.
Texture conveys information on the regular or irregular distribution of image intensity and, as such, it carries information on the structural arrangement of the different retinal layers and how they differ between the health, AD and PD conditions. Currently, the clinical diagnosis of these two neurodegenerative disorders is a challenging task, since there are no definite in vivo biomarkers. Texture analysis of OCT data may thus represent a novel tool in the identification of these diseases, providing a simple, inexpensive and non-invasive method of directly assessing neurodegeneration.