Bipartite graph modeling of Alzheimer’s disease and its early automated discrimination through region-based level set algorithm and support vector machine in magnetic resonance brain images

This paper offers a bird’s eye perception of how bipartite graph modeling could help to comprehend the progression of Alzheimer Disease (AD). We will also discuss the role of the various software tools available in the literature to identify the bipartite structure in AD affected patient brain networks and a general procedure to generate a graph from the AD brain network. Further, as AD is a minacious disorder that leads to the progressive decline of memory and physical ability we resort to Computer-Aided Diagnosis. It has a vital part in the preliminary estimation and finding of AD. We propose an approach to become aware of AD particularly in its beginning phase by analyzing the measurable variations in the hippocampus, grey matter, cerebrospinal fluid and white matter of the brain from Magnetic resonance images. Hence an appropriate segmentation and categorization methods are projected to detect the presence of AD. The trials were carried out on Magnetic resonance images to distinguish from the section of interest. The effectiveness of the CAD system was experimentally evaluated from the images considered from publicly available databases. Obtained findings recommend that the established CAD system has boundless prospective and great guarantee for the prognosis of AD.


INTRODUCTION
In this big data era, it is crystal clear that the growth rate of information is exponential in the biosphere. Further, results from hosts of other databases through experiments of high throughput value grow rapidly with every passing day. So, genomics is predicted to witness a huge activity and will easily dominate other related branches with reference to big-data diapason. A pertinent property of the bipartite networks referring to biomedical in particular, is to throw back the entities they contain that are abstract and use data integration techniques and incorporate data from various sources like clinical symptoms, diseases, pharmaceutical drugs, in contrast to other networks [1,2]. This imposes the necessity for the spawn of biological databases accessible to the public and which comprise data of appreciable quality. Such databases have a decisive role in the field of bioinformatics as they provide an immense opportunity to get an exposure to germane biological data [3]. Gene-disease correspondence study, a basis of biomedical networks, are quite difficult due to non-replicable nature [4,5]. Unavoidable capital catalogs are OMIM (Online Mendelian Inheritance in Man) and GWAS (genome-wide association study). Computing power of the present era permits scrutiny of giant networks, but the task of visualization and scalability continues to be very hard [6].
A graph G with vertex set V and edge set E is said to be bipartite if V = SՍT and SՈT = ɸ. Also, each element of E will have one end in S and the other end in T. (S, T) is said to be a bipartition of the vertex set V of G. These graphs are understood as graphs with no odd cycle in it and in terms of vertex coloring terminology deemed as 2-colorable graphs. A variety of classes of graphs such as trees, acyclic graphs, circular graphs with vertex set cardinality and evennumbers are bipartite graphs. Suppose |S| = |T| then we call 'G' as balanced bipartite. If both S and T possess equal degree for all its elements then we call it a biregular bipartite graph. AD brain network can be thought of as a bipartite graph G = (SՍT, E) in which the elements of S stand for causative factors of the disease and the elements of T stand for the associated genes. A causative factor and a gene can be joined by means of an edge if that gene is affected by that particular causative factor. So, given an AD affected brain network which is bipartite or the carefully determined largest bipartite sub network of the given network, one can construct an AD causative factor network by projection, where the genes are affected by the causative factors. As bipartite graphs possess their own attributes, it would be very handy if we develop viable clustering algorithms that are also scalable by utilizing their topology. Creation of Layout, analysis and visualization tailored to bipartite and generalized to n-partite graphs could contribute immensely to close the big gap in the area of biomedical science. For an interesting note on topological features concerning bipartite graphs one can see [7][8][9].

CERTAIN GRAPH STRUCTURAL MEASURES
Closeness centrality conveys whether a vertex can interact with other vertices in a network via short paths. If a vertex is more central than closer it will be to all other vertices. It is proportional inversely with the least path length among two vertices. A vertex in one partite set can have a least distance one from vertices of the other partite set and two from vertices of the same set in a bipartite graph. Due to this, the length of all paths among vertices of the same set are even. This attribute poses difficulty in the determination of various other measures.
The vertices with large betweenness centrality act as links among two densely connected coteries. Everyone Vs Everyone least path finding is done to calculate betweenness centrality in believable manner. So, every vertex raises its centrality value each time it is engaged in a least path. The vertices with large betweenness centrality values act as agents among two or more adjacent circles. In a bipartite graph, paths can begin and end at a vertex of each vertex set. Next, eigenvector centrality identifies the vertices that are joined to pertinent vertices such as hubs in a network. It is in ratio to the sum of such centrality values of the vertices to which it is adjacent to. For more on bipartite eigenvector centrality see [10].
Clustering coefficient in the global sense marks the ability of a network to set dense clusters. and local clustering coefficient marks the ability of a vertex to become a member of a cluster. Note that treating directly a bipartite network with two clustering coefficients conveys no meaning [11]. Also, usual clustering coefficients do not mean much for bipartite networks as cycles of length 3 are not present. To avoid such issues for networks with 2-mode see [12][13][14][15][16].

SOME TOOLS FOR BIPARTITE NETWORK MODELING
Software and libraries can be found in the literature for visualization and analysis of bipartite networks. Generally, application specific software as a plug-in is required for an available tool. A bioinformatics specific open source cytoscape software is used for the visualization purpose of bipartite networks that are interactive in nature [17,18]. So, vertices of the disjoint sets S and T of bipartite graph G = (S, T, E) can be chosen, put in place alone, and subject to proper arrangement through layouts of grid type, circular type or hierarchical type.
One can assign a unique color to various groups and track the vertices in every layer. DisGeNET [19] is a plug-in cytoscape specially found to probe networks related to disease-gene association. It permits us to make use of a disease-gene database comprising wholesome data acquired from different resources. DisGeNET submits disease-gene networks as bipartite graphs and gives the choice to view both gene-gene and disease-disease networks extracted from the diseasome. Latest search choice allows the creation of sub-networks and facilitates the probe of diseases linked with common genes. BiLayout, a Java based plug-in to determine a bipartite network layout for two groups of vertices. It permits actions like choosing one of the groups, exhibit and hide unconnected vertices, export sets of vertices and perform network resetting. Visualization of all nearby vertices in a customized manner of some vertex is possible through mouse-over effect. A Windows based noncommercial 32-bit pajek program package for both probe and visualization is freely available and can handle massive networks of vertex cardinality one billion edge cardinality unlimited [20]. It provides various ways to visualize bipartite graphs and provides the means for the bi-partite graphs' one partite projection. NetworkX [20] is another software program option for the processing of bipartite graphs and other graphs. Here a vertex property with values 0 or 1 called 'bipartite' allows the finding of the specific set of every vertex. Attention must be taken to confirm the absence of links among vertices in the same set. Despite Net-workX's requirement of user participation for bipartite networks creation, it gives different selections for projection, data explanation and drawing of bipartite networks. UCINET [21] software for Windows for commercial purposes meant for investigation of data related to social networks. It comes with the NetDraw tool to visualize networks that are bipartite. Gephi [22] is one another wellknown and widely used software program that is open-source for visualization of networks and graphs. FALCON code rendered in R, MATLAB and Octave permits us to install null models with least effort. Arena3D's [23] is a software concept to visualize with less effort as their vertices can be compartmentalized onto various layers and color assigned differently. Circos [24] is a software tool in genomics to visualize variations in structure and to compare among genomes. It makes use of an ideogram circular layout to allow the display of links among position pairs through ribbons that encodes size, orientation and position of connected genomic members. There are other tools to visualize bipartite graphs like graphVizdb, OndexProviz, VizANT, GUESS, UCINET, MAPMAN, PATIKA, Medusa, Osprey for 2 Dimensions and BioLayout Express for 3 Dimension [25][26][27][28][29][30][31][32][33][34][35]. R consists of different packages to handle networks that are bipartite. It provides various options to find a series of structural indices explaining the network which is bipartite. So, provided with the potential to simulate graphs with given attributes, it may be utilized to compare and contrast results to null models. Netpredictor [36] is an R package meant for the guess of missing links in bipartite networks. Further it permits determination of different bipartite network attributes, computation of vital communication among two sets of vertices through testing which is permutation-based and for the visualization of groups for two distinct sets of vertices. biGRAPH [37] is another R generalization to the widely known igraph that gives a batch of techniques for bipartite graphs analysis, including projection and managing the task of loss of information. Biomedical networks link entities, such as diseases, causative factors, symptoms and genes. Main aim is toexplain direct interactions in the same group. Also, loss of information results due to projection of a bipartite network as a unipartite network.

PROCEDURE TO GENERATE A GRAPH FROM AD AFFECTED BRAIN NETWORK
We know that AD, one form of dementia, is a progressive brain disease which cannot be reversible. Graph Theorists prefer various imaging techniques to generate a graph out of an AD brain network. Images obtained from magnetic resonance data are split into several areas of interest called regions. Then it is further subjected to diffusion sensor imaging and tractography to compute structural connectivity matrix. A typical AD brain network thus generated is taken for graph theoretical treatment. We intend to demonstrate in the near future that small world attributes and global efficiency measures are low in value in AD affected patients in comparison to normal persons. For this purpose, our choice of subject selection is AD patients in Tamilnadu, one of the states in India. We propose to probe the validity of causative factors like food, smoking habit, drinking habit, life style, hereditary history etc and find out the genes that gets eroded due to these factors and model the phenomena using bipartite graphs and exploit the attributes of such graphs to find out a strategy to prolong the early onset of AD.
OBJECTIVES a. Probe and analyze complex Alzheimer Disease (AD) affected network of patients and its dynamics at a mega scale through the notion of graphical representation of the discretized version of continuum of brain networks and the values of its certain standard structural parameters like betweenness centrality, closeness centrality, eigenvector centrality, clustering coefficient because it will provide more insight towards the comprehension of the gradual progression of this dreaded disease, a silent killer. b. Probe the possible role of bipartite graphs and its impact on this AD affected brain network. and prepare for a more detailed study elsewhere regarding how the topological properties of such graphical networks could be applied to biologically oriented case studies of AD affected patients. c List out the available methods in the literature and viable list of software for bipartite graph visualization. This may help in our further exploration elsewhere for novel patterns and see the light at the end of the tunnel regarding the utilization of bipartite graphs and software tools and how it can pave the path for the solution of some challenging tasks associated with the understanding of the progression of AD.
d. List out the procedure to generate a Graph from AD affected patients' brain network. e. Access permitted database sources and obtain images of AD affected patients in India and perform sequentially a) MRI Image acquisition b) Pre-processing c) Segmentation d) Feature extraction e) Classification and report the results.

LITERATURE SURVEY
AD is a slowly progressive disorder that demolishes reminiscence and thinking abilities and, ultimately, the ability to do simple daily activities. AD is the highly widespread reason for dementia in the midst of older people [38]. Internationally India has the 2nd greatest number of people who bear any kind of dementia that is highly and closely associated with Alzheimer Disease. According to 'Dementia India' about 5.3 million people are staying alive with dementia. Globally 50 million people are affected with dementia. Initially the changes are not much noticeable, gradually it affects the memory and ultimately leads to physical disability, which may be expressed in three phases such as preclinical AD, Mild Cognitive Impairment (MCI) and Alzheimer's dementia phase, which is further categorized into mild, moderate and severe AD [39]. At the preclinical AD phase, measurable variations may be noticed in neuro images, but still since the brain has an automatic mechanism to adopt slight changes, it doesn't affect the memory and enable them to function normally. Age-related cognitive decline varies from slight decline in memory called MCI. At the mild AD phase, the majority of the people can act independently, with slight decline in thinking. Moderate stage is a lengthy phase, individuals gradually lose their ability to act independently and have change in behavior such as campaigning, deviousness etc. Final severe stage requires complete assistance to aid in daily activities and turn them to bed-bound [39].
It is very tedious to recognize it in its very early stages, and it necessitates a precise screening procedure. AD is highly associated with loss of brain volume, so called atrophy, and the formation of inflammations in the White Matter (WM) and Grey Matter (GM) such as White Matter Hyperintensities (WMH) [40]. At the beginning, the hippocampus, a tiny curved construction of the lateral temporal lobe, is disturbed [41]. Hippocampus is actually responsible for making new memories, cognitive learning skills and emotions. Therefore, it is feasible to detect and determine the level of AD by carefully scrutinizing the volume of hippocampus, WM, GM and Cerebro Spinal Fluid (CSF) using Magnetic Resonance Imaging (MRI), extensively preferred for acquiring struc-tural 3D brain images comparing with other existing neuroimaging modalities [42,43].
Extensive efforts in this area state that atrophy determines the vulnerable regions of the affectedbrain, specifically the hippocampus region, that depicts the severity of AD [44]. Pool of works have been projected for the very early detection of AD and MCI. Structural Magnetic Resonance Imaging (sMRI) was used to determine atrophy [45][46][47][48][49] Fluorodeoxyglucose Positron Emission Tomography (FDG-PET) was utilized to define metabolic brain variations [50][51] and abnormal amyloid accumulation was detected from CSF [52][53]. This may limit the categorization performance due to various information offered by various modalities of biomarkers. Features such as voxel-based morphometry [54][55][56] thickness of the cortical [57][58][59] volume of hippocampus [60][61][62] have been focused for classification mostly.
Manual detection is time consuming and may lead to false interpretation. Hence with the aid of advanced well featured computer platforms automatic detection is essential to identify the AD biomarkers or features of interest. Accurate identification of AD, particularly at its early alarming phase such as MCI, have become a decisive effort which may assist the neurologist to postponement or even to evade dementia. Computer Aided Detection (CAD) is a computer-based process designed to analyze medical images for suspicious areas. The prime goal of CAD is to improve the recognition of AD by dropping the False Negative Rate (FNR) owing to observational omission. This research work aims to show a CAD methodology to identify the AD biomarkers. Features are extracted from the segmented AD biomarkers and then classified to detect AD. Present study aims to develop a well-organized approach for the early finding of AD that comprises pre-processing, Segmentation, feature extraction and classification and to find an appropriate solution for clinical areas particularly in early diagnosis by designing an efficient segmentation and feature extraction algorithm for improving the recognition rate.

METHODOLOGY
We carry out the said objective in five stages given as below: 1) MRI image accretion 2) Pre-depuration 3) Segmentation 4) Facet unsheathing and 5) Stratification (depicted in figure 1).

Pre-depuration
Image quality and accurateness are the central aspects of this work. Image quality evaluation and enhancement was performed on pre-depuration, which is also essential for further depuration. It involves registration, skull stripping and histogram normalization. Image registration is done to align the input image with reference in order to overcome the issues such as image rotation, scale, and skew of overlaying input images. Objective of the stripping is to eliminate the thick outer layer of the brain from cortex and cerebellum. Followed by this Histogram normalization was carried out to change image intensities to obtain high contrast.

Segmentation
It is an important procedure in medical image scrutiny. The segmented images make analysis easier and hence the derived regions will attain significance rather than the whole image. GM, WM, CSF and hippocampus regions are segmented using region based level set algorithm [63], in which contours or surfaces are denoted as the zero-level set of a higher dimensional function, known as level set function. With the aid of this, image segmentation problem has been devised and tackled in a standard manner. A benefit of the level set method is that numerical calculations involving surfaces and curves can be achieved on a moored cartesian mesh.
Facet unsheathing It is the process to characterize raw images in its condensed form to make possible decisions such as pattern categorization. The gray-level co-occurrence matrix (GLCM) is a well-established second order statistical technique which is employed for extracting texture and morphological features from the region of interest. Volumetric features from cortical and subcortical regions and volume of hippocampus were extracted and considered.

Stratification
Classification is the final stage of the proffered CAD approach. The digged out features are fed into the Support Vector Machine (SVM) classifier. This supervised classifier distinguishes Persons with AD, by using training data set to obtain an exact segregating hyperplane in an n-dimensional space with Radial Basis Kernel (RBF) function, since it performs well with less dataset than others.

FINDINGS AND DISCUSSION
Present work was executed in MATLAB and the results were evaluated. Images were considered from normal individuals (n=146 aged 75.19 ± 4.48) who had been tracked for three years and individuals (n=98 aged 74.1 ± 6.05) with MCI who had transmuted to AD within three years. Trained SVM classifier,distinguished patients with AD. Figure 2 (a) depicts the axial projection of MRI Image. During image processing the pixels of skull area hinders in analyzing the region of interest or further processing hence it was eliminated using brain extraction tools and depicted in Figure 2 (b). In general, low pixel intensity images affect the proper segmentation hence image enhancement was performed and shown in Figure 2 (c). Segmented images are shown in Figure 2 (d) (e). Projected method was statistically evaluated with gold standard data and the obtained accuracy was 90%, sensitivity was 91% and specificity was 91.6%. Proposed work was compared with the existing methods and shown in Table 1.

CONCLUSIONS
In this note, we have outlined our plan of research in the near future for the AD brain network analysis and hope to 1) determine whether bipartite methods or distance metrics developed for flow, nestedness, community detection and modularity analysis, coloring concept of distance based graphs could be used in biomedical networks, such as AD network, 2) use other popular imaging techniques such as functional magnetic resonance imaging (fMRI), electroencephalography (EEG), positron emission tomography (PET) to generate graphs for AD affected brain networks and compare the same  with the one obtained from MRI and repeat the entire computation process and complete the compare and contrast study. Also, we would intend to do such comparative study with subjects from other origin and race and color. Ultimately the intention is to develop a strategy for prolonging the early onset of this dreaded disease. Further we presented a methodology to detect the AD from measurable volumetric changes in the affected brain. Input MRI images considered from databases are experimentally evaluated. Images are initially preprocessed and segmented to distinguish the normal from affected individuals. Volumetric features from cortical and subcortical regions and volume of hippocampus were extracted and considered for training. SVM classifier was used to segregate the hyperplane in an n-dimensional space with RBF function. The proposed approach offers 90% accuracy, 91% sensitivity and 91.6% specificity.