A novel semi-supervised multi-view clustering framework for screening Parkinson's disease

In recent years, there are many research cases for the diagnosis of Parkinson's disease (PD) with the brain magnetic resonance imaging (MRI) by utilizing the traditional unsupervised machine learning methods and the supervised deep learning models. However, unsupervised learning methods are not good at extracting accurate features among MRIs and it is difficult to collect enough data in the field of PD to satisfy the need of training deep learning models. Moreover, most of the existing studies are based on single-view MRI data, of which data characteristics are not sufficient enough. In this paper, therefore, in order to tackle the drawbacks mentioned above, we propose a novel semi-supervised learning framework called Semi-supervised Multi-view learning Clustering architecture technology (SMC). The model firstly introduces the sliding window method to grasp different features, and then uses the dimensionality reduction algorithms of Linear Discriminant Analysis (LDA) to process the data with different features. Finally, the traditional single-view clustering and multi-view clustering methods are employed on multiple feature views to obtain the results. Experiments show that our proposed method is superior to the state-of-art unsupervised learning models on the clustering effect. As a result, it may be noted that, our work could contribute to improving the effectiveness of identifying PD by previous labeled and subsequent unlabeled medical MRI data in the realistic medical environment.


Introduction
Parkinson's disease (PD) is a degenerative and disabling disease in the nervous system [1], which generally occurs in the elderly. Its clinical manifestations mainly include quiescent tremor, motor retardation, myotonia and postural gait disorder. PD not only affects the life quality of a patient, but also brings heavy burden to his/her family and the society. According to medical statistics, PD is rare in young populations aged under 40 and the average age among people with PD is 60 years old, of which incidence rate tends to increase with age [2] - [5].
For the elderly, screening PD as early as possible is very vital for prevention and delaying progress to assist in auxiliary diagnosis. At present, the diagnosis of PD mainly relies on the clinical symptoms of patients and the professional knowledge of clinical neurologists. However, some missed diagnosis and misdiagnosis may happen due to the complexity of pathology in PD [6]. Most doctors would recommend inspecting a neuroimaging examination before the formal clinical diagnosis of PD, containing magnetic resonance imaging (MRI), functional magnetic resonance imaging (fMRI), and positron emission tomography (PET), etc. In this study, we evaluate the proposed method on the MRI data obtained from the Parkinson's Progression Markers Initiative (PPMI) platform [7].
Nowadays, there are some machine learning techniques which are used to automatically diagnose PD and predict clinical diagnostic scores. Shi et al. constructed a novel cascaded multicolumn RVFL+ (cmcRVFL+) framework for the single-modal neuroimaging-based diagnosis of PD to reduce the difficulty of data acquisition [8]. Peng et al. [9] used a multilevel-ROI-features-based machine learning method to detect sensitive morphometric biomarkers in PD. Prashanth et al. [10] developed a classification model based on machine learning techniques to partition degenerative early PD class and healthy normal/non-degenerative condition class. Oliveira et al. [11] studied a fully automatic computational solution for computer-aided diagnosis of PD with the technique of support vector machines and a voxel-as-feature approach based on single photonemission computed tomography brain images. Garraux et al. [12] used relevance vector machine in combination with booststrap resampling for nonhierarchical multiclass classification based on fluorodeoxyglucose positron emission tomography scans performed in patients of PD. Long et al. [13] proposed a non-invasive technology intended for using in the diagnosis of early PD by integrating the advantages of various models with multi-modal MRI data. Abs et al. [14] investigated connection-wise patterns of functional connectivity to distinguish Parkinson's disease patients according to their cognitive status by using machine learning methods. Lei et al. [15] discussed a joint regression and classification framework for PD diagnosis via magnetic resonance and diffusion tensor imaging data. They devised a unified multitask feature selection model to explore multiple relationships among features, samples, and clinical scores. Adeli et al. [16] proposed an approach to diagnose PD with MRI data using a feature-sample selection (JFSS) method and a robust classification framework. Adeli et al. [17] also investigated a joint kernel-based feature selection and a classification framework for early Diagnosis of PD on multimodal neuroimaging data.
However, the existing approaches on automatic diagnosis mainly focus on classification or prediction of PD. The study of auxiliary screening and the diagnosis of PD on MRI data with semisupervised multi-view clustering approaches is scarce, and there are few screening model working on preventing and delaying the PD deterioration. In this work, we propose a novel semi-supervised multiview learning framework with clustering based on the Robust Multi-View K-Means Clustering (RMKMC) [18] by the cross-validation method [19], namely SMC (as shorthand for Semi-supervised Multi-view Clustering architecture). We train and test our model using multi-view brain MRI data information after data representation with the Linear Discriminant Analysis technology (known as LDA) [20] and the sliding window method [21]. Compared with the existing clustering methods, such as Gaussian Mixture Model (GMM) [22], K-Means [23], K-Medoids [24], Agglomerative Clustering algorithm (AC) [25], Balanced Iterative Reducing and Clustering Using Hierarchies (Birch) [26] and Spectral Clustering algorithm (SC) [27], our model achieves the best results and effectively partitions three types of samples (i.e., non-patients, prodromal PD and confirmed PD). The results are useful to help doctors for auxiliary diagnosis, especially in screening latent PD, which can promote early diagnosis and treatment, delay disease progression and further reduce the occurrence of Parkinson's disease.
In summary, this paper makes the following contributions: 1) It proposes a novel semi-supervised multi-view learning architecture with clustering (SMC) for screening Parkinson's disease.
2) The proposed SMC uses the dimensionality reduction algorithm of Linear Discriminant Analysis (LDA) and the sliding window method to extract effective feature information from the MRI data. In such a way, this supervised method could make full use of the color and texture features in MRI images and avoid overfitting issues caused by data dimension imbalance.
3) It exploits multi-view learning technology on the proposed semi-supervised learning architecture. Multi-view learning is not only able to capture key features from each single view, but also can integrate the comprehensiveness of MRI features.
The rest of this paper is organized as follows. Section 2 introduces the preprocessed dataset, our proposed SMC model and experimental design. Section 3 presents the extensive experiment results. Furthermore, the discussion is conducted in Section 4. Finally, Section 5 concludes this paper.

Data Preprocessing
In order to improve the accuracy and performance of our method in experiments, the original MRI data sets need to be standardized. The initial datasets we selected are preprocessed by a few methods. The specific processing steps mainly include: obtaining gray value image, using median filter to remove noise points, acquiring linear normalization gray value, and eventual extracting Region Of Interest (ROI) shown in Figure 2 by detecting the edge features of brain MRI. Also, the swallow tail area for each sample is signed. More obviously, the swallow tail areas of two MRI for the three classes are showed in Figure 3.

Feature Extraction
In order to make full use of the color and texture features in the MRI data sets, the 7x7 sliding window method is utilized for feature extraction, as the Figure 4 shown. The initial position of the green box is the upper left corner of ROI, after the sub image is extracted, the window is moved from left to right, from top to bottom, until the lower right corner of ROI is overlapped with the red box [21]. Each subimage is quantized with 16 level gray scale. After that, the 4 different types of texture features in gray level co-occurrence matrix are obtained, including contrast, homogeneity, energy and correlation [29]. And the 3 categories of statistical information regarding to standard deviation i.e. sigma, skew and kurtosis are also extracted. As a result, 7 different types of features of the MRIs are extracted from each sliding window. Simultaneously, we reconstruct 7 types of features as 7 data views, of which dimension size of each view is equal to the number of image pixels and each feature is independent with each other. Finally, seven views representing different categories of features are shown in the Figure 5.

Swallow Tail of Prodromal PD Swallow Tail of Healthy Normal
Swallow Tail of Confirmed PD Figure 5. The seven views extracted from one MRI.

Our proposed SMC
In this study, we mainly integrate 3 machine learning techniques into our SMC framework to assist in auxiliary diagnosis of PD, including the Cross-Validation technology, the dimensionality reduction technology and the robust multi-view K-Means clustering technology. In this section, we will review these methods and give the detailed framework of our proposed SMC.

Review relevant methods
The Cross-Validation technology. Cross-Validation is a method widely used in machine learning to build models and verify model parameters [5]. It is a statistical method to evaluate and compare the algorithms by dividing data into two parts: one for training model, and the other for validating model. In a typical Cross Validation, training and validation sets must cross in successive rounds so that every data point can be validated. The basic form of Cross Validation is K-fold Cross Validation. Other forms are special cases of K-fold Cross Validation, or repeated rounds related to it [30].
The Dimensionality Reduction technology. There are two main methods of dimensionality reduction: unsupervised method and supervised method. For an unsupervised method, the labels of data could not be tagged, which means that we can only classify or cluster the data samples by learning similar features among samples, while for a supervised approach, class labels are considered [31] to obtain more robust classification or clustering results. There are many unsupervised dimensionality reduction technologies, such as independent component analysis (ICA) [32] and non-negative matrix factorization (NMF) [33], but the most dominantone is principal component analysis (PCA) [34]. PCA can reduce the dimension of data while retaining most of the changes in the data set. In addition, many supervised dimensionality reduction technologies are proposed, such as mixed discriminant analysis (MDA) [35] and neural network (NN) [36], but the most famous one is linear discriminant analysis (LDA) [37]. LDA can improve the calculation efficiency in the process of data analysis and reduce the overfitting caused by the increase of dimensionality. The Robust Multi-View K-Means Clustering technology. There are many multi-view clustering methods [38] to utilize the features from multiple views and enhance the experimental performance. The Robust Multi-View K-Means Clustering (RMKMC) we selected in this paper is based on the K-Means clustering method with robust and multi-view knowledge. RMKMC could integrate the heterogeneous features for clustering and solve the large-scale multi-view clustering problem. Utilizing the common cluster indicator, RMKMC could search a consensus pattern and do clustering across multiple visual feature views. Moreover, this method is robust to the outliers in input data, and learns the weights of each view adaptively [18].

Semi-supervised Multi-view learning architecture with Clustering
In this paper, we propose a novel Semi-supervised Multi-view learning framework with Clustering (SMC) based on RMKMC [18] which avoids sampling bias and achieves better the effect of dimensionality reduction and superior performance of semi-supervised clustering. The proposed SMC exploit cross-validation strategy to divide the multi-view samples into training data with labels and testing data unlabeled. The training data is used to train the supervised dimensionality reduction model and then the testing data is inputted into the model to obtain the reduction results. Finally, a clustering method is utilized to partition the test data into several clusters. The SMC is not able to extract the effective features among MRIs, but also avoids to collect massive data samples in the field of PD. And more detailed description about the algorithm framework of SMC is shown in table 1. 4. Clustering with data ( ) .

Experimental design
In the experiment, we compare our SMC algorithm with several classical single-view clustering methods including GMM, K-Means, K-Medoids, AC, Birch and SC to verify the effect of our multi-view technique. And the unsupervised dimensionality reduction method like PCA is utilized in our experiments to compare with LDA and demonstrate the performance of semi-supervised learning in screening PD. In addition, the 5-fold cross validation strategy is selected to train semi-supervised model [5].
Specifically, to demonstrate that the proposed semi-supervised framework can mine more accurate information and are superior to unsupervised methods, the PCA which is an unsupervised technique is selected to reduce the dimension of the extracted features [34] to compare with the SMC framework. Meanwhile, the number of training dataset with label and testing dataset without label is divided into 80% and 20%. The results of the two groups of experiments are compared and analyzed with different clustering methods and data views, separately. Besides, the experimental results of each clustering and data view are compared only by SMC framework to show the performance of our model. Figure 6 gives the more detailed process of the experimental design.
There are a lot of clustering result evaluation indexes for clustering experiment, but the factors of the requirement of medical disease prediction need to be considered. Three standard clustering evaluation metrics are chosen to measure the clustering performance, that is, Clustering Accuracy (Acc), Fowlkes-Mallows score (FM) and Adjusted Rand index (Rand) [39].
Acc is the proximity value of the clustering results, which could evaluate the accuracy of the cluster. The Acc is defined as follows： Where is the number of data items that are correctly classified to each class. The bigger the value of MP is, the better the clustering performance is.
The FM is defined as the geometric mean of the pairwise precision and recall: where , , are true positive, false positive and false negative, respectively. And the Rand index needs to give the actual category information , assuming is the clustering result, arepresents the logarithms of elements of the same category in both and , and represents the logarithms of elements of different categories in and . The Rand index is below: Among them, the total number of element pairs that can be composed in the 2 data set.
The value range of is [0,1]. For random results, the Rand index cannot guarantee that the score is close to zero. In order to achieve the goal that when the clustering results are randomly generated, the index should be close to zero, Adjusted Rand index (Rand) is proposed, which has a higher degree of discrimination. ARI is shown as follows: The value range of is [−1,1] . A larger value means that the clustering result is more consistent with the real situation. In a broad sense, measures how well two data distributions fit.

Results
In this section, we firstly give the comparison of experiments results with different clustering methods and our proposed SMC. And then we demonstrate the effectiveness of SMC for screening PD. At last, all the different feature views are analyzed only by SMC for the screening and identification of PD.

The compared results
In this subsection, to verify the effectiveness of semi-supervised technique, we evaluate the clustering results using SMC with those with traditional Unsupervised Clustering by PCA (UCP) on 6 single-view clustering methods, including GMM, K-Means, K-Medoids, AC, Birch, SC and 1 multiview clustering algorithm, namely RMKMC. Because there are 7 feature views i.e. Contrast, Homogeneity, Energy, Correlation, Sigma, Skew and Kurtosis, of which dimensionality reduction are processed by our proposed SMC and traditional PCA respectively, we implement each baseline algorithm on each view data and the compared results of SMC and PCA on different clustering methods are presented in Figure 7. Meanwhile, to show that our proposed model is good at feature extraction and enhancing the performance of clustering, we also utilize the SMC and UCP on each single feature view, and the compared result of different views are displayed in Figure 8.  Figure 7 shows the comparison results of different clustering algorithms after dimensionality reduction using our SMC framework and UCP. In this figure, we average the value of the clustering results of all the 7 views (i.e. Contrast, Homogeneity, Energy, Correlation, Sigma, Skew and Kurtosis) for the baselines, except for the multi-view clustering algorithm RMKMC. Compared with the singleview algorithms, it can be clearly noted that the SMC framework is superior to capture the significant features among MRIs than those which directly utilize UCP to do dimensionality reduction clustering according to the evaluation of ACC, FM and Rand values. As for the multi-view method, it is noted that our SMC has the better performance and it illustrates that SMC can exploit the LDA to train a  Figure 8 shows the comparison clustering results of different feature views after dimensionality reduction with our SMC and UCP. In this figure, the clustering values of each feature view are the average values of the GMM, K-means, K-means, AC, Birch, SC and RMKMC. It is obvious that the dimensionality reduction results from our proposed SMC framework is more appropriate than those from UCP on different evaluation metrics. Moreover, it is worth noting that our SMC is more stable than UCP when extracting effective information from different views.

The discovery of efficient clustering methods and feature view for screening PD
In order to find out the best clustering algorithm for screening PD and demonstrate the superiority of multi-view learning technology, we also compare the performance of different clustering methods with our proposed SMC. The detailed results are shown in the Table 2, 3, 4.  Table 2 shows the Acc results of different clustering algorithms with different feature views after using SMC algorithm framework. Obviously, the multi-view clustering method is superior to other baselines, at around 74±8.9, which illustrates that multi-view algorithms are capable of capturing sufficient features. As for the single view clustering methods, the algorithm with best average effect of the 7 feature views is K-Means in most cases, which is 65.7±13.6. The best average effect of six single view clustering algorithms is the Correlation feature view data set, which is 65.8±16.0. In the single view clustering methods, the all the highest values of accuracy are bold. Also, all the single view clustering methods obtain the highest Acc value in the Correlation feature view data set. From different feature views, the best single view clustering Acc of the seven characteristics are also bold, respectively.  Table 3 shows the FM results of different clustering algorithms and different feature views after using SMC algorithm framework. Similarly, the best FM value is 0.59±0.11 combining different feature view by RMKMC. And all the baselines achieve best performance on the Correlation feature view, at about 0.53 ±0. 10. It shows that Correlation view contains more effective and important information than others. Out of the six single view clustering methods, the average effect of SC clustering method is superior to other baselines.  Table 4 shows the Rand results of different clustering algorithms on different feature views after using SMC algorithm framework. Also, the multi view clustering algorithm RMKMC combined with different features is still the best, of which Rand value is 0.38±0.17. The best Rand average value of six single view clustering methods is from the Correlation feature view, which is 0.26±0.18. Overall, K-Means has the better performance of Rand in out of the 6 single-view algorithms, which is 0.25±0.17.
From the Table 2, 3 and 4, it demonstrates that the Correlation feature view is more essential and contains more effective information than other six feature views for screening PD under our proposed SMC. Simultaneously, the K-Means clustering method achieves the best overall performance not only in Table 2 but also in Table 3 and 4 compared with all the single view clustering models. Moreover, due to the ability of integrating different features from multiple views, RMKMC is more suitable to mine vital features from MRI data and to enhance the performance of clustering. Thus, the SMC framework based on RMKMC is capable of screening PD and assist in auxiliary diagnosis.

Discussion
With the development of artificial intelligence technology, the medical industry is paying more and more attention to use relevant methods to help prevent and diagnose diseases. At present, the global aging trend is relatively obvious, and PD is easy and common to occur in the elderly. Therefore, this study mainly uses machine learning technology to screen for PD through the human brain MRI data sets, which has the beneficial effect of prevention and auxiliary diagnosis. The experimental data set in the work is derived from an open data research platform i.e. PPMI.
In this paper, we mainly proposed a novel semi-supervised multi-view learning architecture with clustering (SMC) for screening PD. The SMC is a clustering learning framework based on LDA, the cross-validation method and RMKMC method. However, our SMC can also integrate the single view clustering methods i.e. GMM, K-Means, K-Medoids, AC, Birch and SC. In the experiments, we firstly preprocess the three-class data sets including non-patients, prodromal PD and confirmed PD by obtaining gray value images, removing noise points etc. And then the 7x7 sliding window method is used for extracting the 7 different feature data views containing Contrast, Homogeneity, Energy, Correlation, Sigma, Skew and Kurtosis. Finally, all the data sets are implemented into dimensionality reduction by SMC model and traditional PCA respectively, and compared by different clustering methods. It demonstrates that our proposed SMC framework outperforms other traditional model according to the comparison and analysis of the experimental results.
In the reality of examination and diagnosis for PD, there are only partially labeled brain MRI images. Thus, the excellent experimental results of SMC are very meaningful to provide the semisupervised machine learning technology to assist in the diagnosis of PD with some data labels. In addition, this technology could be used in the physical examination of the elderly for screening latent PD, providing an early diagnosis and reducing the incidence of PD.

Conclusion
It is influential to use computer-aided diagnosis technologies with brain MRI data (or other types of medical data) to assist physicians to screen PD for making an exact diagnosis and treatment in modern medical industry. In this paper, we proposed a novel semi-supervised multi-view clustering architecture (denoted as SMC). SMC reconciles the key method of LDA, the cross-validation idea, semi-supervised, multi-view learning and clustering technologies. The original data sets from the PPMI database were first preprocessed by the optimization techniques mentioned above. And, the 7 view data sets of different features were extracted by the 7x7 sliding window method from the preprocessed data sets. Then, SMC learns a new representation of dimensionality reduction among multiple data views by training with LDA method. After training, SMC can effectively screen nonpatients, prodromal PD and confirmed PD with different clustering models. Extensive experimental results demonstrated that the proposed model outperforms the traditional non-supervised clustering methods.
As we only evaluate our proposed SMC using MRI data in this work, we will work on collecting text views related to other fields in the future, explore the relationship between different views and further perform our model on these data sets. In addition, we will improve SMC to acquire higher clustering results for screening PD by considering different importance of distinct feature views.