Preliminary Set Theory-Type Analysis of Proteins Associated With Parkinson’s Disease

In an attempt to create a model of Parkinson’s disease (PD) eighty-three proteins were extracted from the SwissProt protein database that had some casual mention of PD. These were split up into various subsets of proteins of which three are focused on here: PARK, made up of proteins that had some indication that polymorphisms in the protein might increase a person’s susceptibility to develop PD; MITOCHOND, proteins which had some association with the mitochondria; and MT-C1D, proteins that were implicated in mitochondrial complex 1 deficiency. The PARK subset had 21 out of 83 proteins (21/83); MITOCHOND 33 out of 83 proteins (33/83); and MT-C1D 17 out of 83 proteins (17/83). The results could be used to build up a basic model of PD creating phenotypes based on sets of proteins. The main phenotypes established here are; non-mitochondrial PD (50/83) and mitochondrial PD (33/83). Further division is possible dependant on whether proteins have polymorphisms which increase susceptibility to develop PD. MT-C1D seems to be independent of the PARK set. This is a very simplistic attempt at trying to model Parkinson’s disease at the proteomic level and will need further work to build up the more complex and realistic PD proteomic disease model. Most of the 83 proteins did not have any known polymorphisms that would lead to an increased susceptibility to develop PD but just mentioned that in some way they were implicated in the disease. From this initial set of 83 proteins which were deemed to have some association with PD a series of over sixty different subsets were produced. Some of these may well be labelled pseudo-sets as these were based on terms that may have been written into the Swiss-Prot datasheet entry in a superficial manner such as mentioning all the terms associated with PD such as ataxia, fatigue, dystonia etc. when in fact each particular term may not have any real relevance to that actually specific protein but is just a generalised description of PD. The three subsets, which all derived from the original 83 PD proteins, are: MITOCHOND, which consists of the mitochondrial proteins; PARK, these were the proteins that had at least one actual polymorphism that is believed might lead to an increased susceptibility to developing PD; and MT-C1D, which included proteins implicated in Mitochondrial complex 1 deficiency. MITOCHOND and MT-C1D were extracted by use of a Perl script, as mentioned above, looking in the CC and FT sections while the PARK was extracted by just looking at the FT section, at the variants. These subsets were compared and contrasted by use of another Perl script to form ‘AND’ sets e.g. PARK AND MITOCHOND. For further details of these sets and for the free use of the Perl scripts (available on request) please visit Disease Motifs at http://www. diseasemotifs.co.uk *Corresponding author: Paul Whitesman, Disease Motifs, Rochdale, Greater Manchester, England, Tel: ++44 (0) 1706 343120; E-mail: paulwhitesman@ diseasemotifs.co.uk Received September 29, 2014; Accepted November 11, 2014; Published November 18, 2014 Citation: Whitesman P (2014) Preliminary Set Theory-Type Analysis of Proteins Associated With Parkinson’s Disease. J Alzheimers Dis Parkinsonism 4: 170. doi: 10.4172/2161-0460.1000170 Copyright: © 2014 Whitesman P. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Journal of Alzheimer’s Disease & Parkinsonism J o u r n a l o f A lzh eim ers ease & Prkin s o n i s m


Introduction
Various diseases are known to have some genetic polymorphism that increases the likelihood or susceptibility of developing a disease. There are some proteins which are in some way linked to a particular disease by either: a polymorphism that leads to an increased susceptibility; increasingly or decreasingly expressed in a particular disease; or the protein is shown in some pathological symptom, as in the case of Parkinson's disease (PD) where alpha-synuclein can form parts of Lewy bodies [1]. This is an introduction to 'set theory-type' analysis of some proteins associated with Parkinson's disease. The set theory-type proteomic analysis here is less about a mathematical treatment of the elucidation of the pathological process of PD but only a simplistic attempt at grouping proteins into sets so as to get a clear picture of the possible pathology involved in the disease process. This disease process can be referred to as a disease network of interacting proteins. It also may be used to form some proteomic foundation for categorising particular phenotypes attributed to a particular disease type.
This work focuses on PD and in which there is evidence that there is some involvement with the mitochondria [2]. There are also diseases such as mitochondrial complex 1 deficiency MT-C1D which, as well as been a condition in its own right, has been associated with progression to PD [3][4][5].

Method
The Swiss-Prot human protein database was downloaded from the EBI website on 29 th October 2014. A Perl script was written that extracted any protein data-sheet that had mention of 'Parkinson' in the comments section (CC) or feature table (FT) of the data-sheet. Originally 86 proteins were found but three of these, AAKG2_ HUMAN, FGF20_HUMAN and SYUB_HUMAN, were considered to be false positives.
Most of the 83 proteins did not have any known polymorphisms that would lead to an increased susceptibility to develop PD but just mentioned that in some way they were implicated in the disease. From this initial set of 83 proteins which were deemed to have some association with PD a series of over sixty different subsets were produced. Some of these may well be labelled pseudo-sets as these were based on terms that may have been written into the Swiss-Prot datasheet entry in a superficial manner such as mentioning all the terms associated with PD such as ataxia, fatigue, dystonia etc. when in fact each particular term may not have any real relevance to that actually specific protein but is just a generalised description of PD.
The three subsets, which all derived from the original 83 PD proteins, are: MITOCHOND, which consists of the mitochondrial proteins; PARK, these were the proteins that had at least one actual polymorphism that is believed might lead to an increased susceptibility to developing PD; and MT-C1D, which included proteins implicated in Mitochondrial complex 1 deficiency. MITOCHOND and MT-C1D were extracted by use of a Perl script, as mentioned above, looking in the CC and FT sections while the PARK was extracted by just looking at the FT section, at the variants. These subsets were compared and contrasted by use of another Perl script to form ' AND' sets e.g. PARK AND MITOCHOND.
For further details of these sets and for the free use of the Perl scripts (available on request) please visit Disease Motifs at http://www. diseasemotifs.co.uk
Seven of the PARK set were in the MITOCHOND (mitochondrial) set (7/83) and hence fourteen proteins from the PARK set were NOT in the MITOCHOND set i.e. not mitochondrial proteins, PARK NOT MITOCHOND (14/83). The seventeen proteins that are in the MT-C1D set were totally included in the MITOCHOND set while none of the MT-C1D proteins were found in the PARK set.
The conclusion of this is that perhaps one can split the proteins associated with PD into at least two groups depending on whether they have some association with the mitochondria or not. Further division is possible based on whether the proteins have been shown to have some polymorphisms leading to an increased susceptibility to develop PD (Figures 1 and 2). MT-C1D is independent of proteins that have been shown to increase susceptibility to develop PD (Table 1-6).

Discussion
The purpose of this work is to try and look at the proteins associated with PD and to try gained some understanding of the nature of PD by looking at these proteins. It has not been the purpose of this work to look in detail at the actual proteins sequences themselves only some secondary characteristics have been used to group these proteins into sets. There have been over sixty of such sets that have been created (see 'Set Theory' page in PD section at www.diseasemotifs.co.uk).  The main consideration here is the connection between PD and the mitochondria and to perhaps differentiate between two, three or more types of PD based on this grouping method. Is it a realistic proposition to group mitochondrial proteins together and label this as a mitochondrial form of PD and can this be matched in clinic considerations?
One of the main problems with the work is the naïve assumption that you can build up a set identity by using the, sometimes subjective, literature that is contained within the comments (CC) of the data-sheets that are in the Swiss-Prot protein database. Fixing on a search term such as 'fatigue' which may be used as a term superficially to describe the symptoms of PD in general when commenting in the Swiss-Prot data-sheet, rather than as a term that has actually any real relevance to that particular protein. This unreal attachment of this term to a group of proteins creates the idea of a pseudo-set i.e. a group of proteins that have a superficial term as their grouping factor. Even some of the 83 proteins in the main PD set may be considered to be superficial linked to PD.
However there is surely some benefit on establishing whether a protein: is a mitochondrial protein or not; whether a group of proteins can be placed in one set due to their involvement in some process of a particular disease such as PD, MT-C1D, cancer or some other disease; or whether there is some similarity of function among a group of proteins e.g. ferroxidase. All the proteins looked at here have some association with PD. Here we are looking mainly at three subsets taken from this larger set of 83 PD proteins: PARK, proteins that are known to have some genetic polymorphisms that will lead to an actual increased susceptibility to develop PD; MITOCHOND made of proteins that had some connection with the mitochondria; and MT-C1D made up of proteins that are associated with mitochondrial complex 1 deficiency ( Figure 3).    One observation is that out of the 83 proteins in the entire collection of PD proteins just less than a half of these have some connection with the mitochondria which might suggest the importance of the role of mitochondrial proteins in PD. On the other hand though the greater part of the PARK group, which is the group of proteins that contain proteins that have some polymorphisms associated with an increase in susceptibility to develop PD, have no obvious connection with the mitochondria.
From this part of the disease network it might be possible to infer that perhaps we could divide PD into at least two major components: non-mitochondrial PD; and mitochondrial PD. The non-mitochondrial PD would include the PARK NOT MITOCHOND subset, based on the 14 proteins in that set, this is with proteins that have no obvious connection with the mitochondria. The second group, mitochondrial PD, could be split into two or more further groups: one that has the genetic susceptibility to develop PD involving the seven proteins in the PARK AND MITOCHOND subset; the other is the residual proteins in the MITOCHOND NOT PARK subset which includes the MT-C1D subset which does not contain proteins that have been shown to    have no inheritable PD tendency nor have any connection with the mitochondria. MT-C1D does not include any proteins from the PARK set which may suggest that when PD develops from the MT-C1D pathway this could be characterised as a non-inheritable PD phenotype.
It is likely that there are additions (and possibly subtractions from errors made in this study) to be made to the number and type of proteins involved in each set. The actual numbers in each set does not really matter and neither do some of the errors as it is the conclusions from the consensus of each set that is important. One major error is the missing of alpha-synuclein as being associated with the mitochondria as it has been shown to be localised to mitochondria [6]. Maybe using the GO terms would be more effective, at least for some terms, and some attempts have made to do this.

Summary and conclusion
It may be possible to split PD into categories or types based on particular protein characteristic:
Whether this can be translated into something that is clinically useful is debatable. Perhaps one of the symptoms of PD such as chronic 'fatigue' could suggest that the person with this 'type' of PD has a form of mitochondrial PD. If this were the case then one could perhaps tailor PD medication to suit the particular type of PD.