Identification of proteins associated with Mycobacterium tuberculosis virulence pathway by their polar profile

1Departamento de Matemáticas, Facultad de Ciencias, Universidad Nacional Autónoma de México, C.P. 04510 D.F., México; 2Unidad de Cuidados intensivos y Unidad de Investigación Biomédica. Hospital Juárez de México, C.P. 07760 D.F., México; 3Departamento de Inmunologia, Instituto de Investigaciones Biomédicas, Universidad Nacional Autónoma de México, C.P. 04510 D.F., México; 4Centro de Investigaciones Químicas, Universidad Autónoma del Estado de Morelos, C.P. 62209 Chamilpa, Cuernavaca, Morelos, México; 5Facultad de Ciencias de la Salud, Universidad Anahuac. C.P. 52786 Huixquilucan Estado de Mexico, México


INTRODUCTION
Although it is curable, tuberculosis is one of the most devastating diseases worldwide, and nowadays it is considered as one of the principal public health problems.The World Health Organization estimates that one third of the world population is infected, with eight million new cases per year and two million deaths as a result of this disease (Dye et al., 1989).A third of infected patients do not receive any treatment with the paradox of the ever increasing multi-drug resistant strains of Mycobacterium tuberculosis (BIW, 2010;WHO, 2014).In the developing countries, the prevalence of tuberculosis shows a progressive increase without being able to predict when it will be controlled (UN Millennium Project 2005, 2005).
Adherence to host cells is an essential virulence factor of pathogenic bacteria.In this critical step in the pathogenesis of intracellular infections, microbial adhesins might participate, which as shown in this study, are frequently glycoproteins (Schmidt et al., 2003;Upreti et al., 2003).Adhesins are located at the surface of bacteria where they interact with complementary receptors on host cell surfaces or with extracellular matrix components.A large number of adhesins has been identified in microbes other than mycobacteria (Klemm & Schembri, 2000).In contrast, in mycobacteria few adhesins have been found.An example is heparin binding hemagglutinin, which is involved in bacterial attachment to lung tissues.In this study we have included two mycobacterial adhesins, PstS-1 and LpqH that interact with the macrophage mannose receptor and promote phagocytosis of the bacilli.
Infection of the host cells is an important virulence feature of pathogenic mycobacteria (Diaz-Silvestre et al., 2005;Esparza et al., 2015).In order to deepen the characterization of proteins associated with Mycobacterium tuberculosis strains on the basis of their physico-chemical properties, we calibrated the mathematical-computational Polarity Index Method (PIM) (Polanco et al., 2012), with all proteins associated with Mycobacterium tuberculosis virulence pathway (MTVP) group from the Tuberculist Database (Lew et al., 2011).The PIM is a supervised method that we used for the identification and characterization of various peptide and protein groups based on the linear representation of the protein (Polanco et al., 2012;2013;2013a;2014;2014a;2014b;2014c;2014d;2014e).Its metric considers only polarity as a physico-chemical property, and it is based on a polar matrix.This matrix represents a static-dynamic overview of the electromagnetic balance of the peptide.Based on the results we can report: (i) the polarity profile is an effective discriminant of the MTVP group (ii) PIM determines the polarity pattern of groups of the same domain or species, (iii) with its use it is feasible to analyze massive databases, and (iv) the inflection points that appear in the figures of the polar profile characterize every studied group.

MATERIAL AND METHODS
The method described here, exhaustively measures only one physico-chemical property, i.e. polarity.This property quantifies the electromagnetic balance of a protein, using the electronegativity of the valence electrons in amino acids.This affinity between electrons in a covalent bond was what Linus Pauling (Pauling, 1955) called "Electronegativity". PIM has previously been reported (Polanco et al., 2012), therefore, here we only introduced the changes that were needed to obtain the specific results shown in this paper.We started this section with an example to clarify the basic principles of this method.
The numerical sequence (i,j) was read by pairs, from the N-terminal to the C-terminal, i.e. from left to right, moving one amino acid in each instant.For example, if the first pair was (i,j) = (4,1), the second pair would be (1,4) and so on until the last pair (1,3).Each incident (i,j) was registered in a matrix P[i,j] where i represented the row and j the column.Subsequently, matrix P[i,j] was normalized to one.For each evaluated sequence a matrix P[i,j] was constructed.
The MTVP group consisting of 239 proteins (Appendix 1), was transformed into a single protein by this method.This was done by joining one protein after another until the 239 proteins were all integrated into a single protein.With this new protein, matrix Q[i,j] was built, representing the target group (Table 1), as in the previous step.
Each matrix P[i,j] was added to matrix ) was normalized to one, linearized, and arranged from large to small frequencies.

Polar interaction Position
Finally, each vector was compared with the rules (Table 2).For the above example, the rules were accepted and therefore the protein was considered an MTVP candidate.It is important to note that these rules were deduced completely by the method as it is now an automated process; prior to that, the rules were the result of the observation of the polar incidents that occurred (or did not occur) in certain positions of the vector.

Polarity Index Method Updates
PIM is a supervised algorithm of the QSAR-type that is used as a training set for the proteins associated with the MTVP group extracted from the Tuberculist Database (Lew et al., 2011).The following modifications were made: Matrix Q[i,j] in the source program (Polanco et al., 2012) was substituted by Table 1, which is representative of the polarity group of MTVP from the Tuberculist Database (Lew et al., 2011).
The rule in the source program was substituted by vector (P[i,j] + Q[i,j]), complying with the rules given in Table 2 to calibrate the method with the groups of the Uniprot Database.
The rule in the source program (Polanco et al., 2012) was substituted with vector (P[i,j] + Q[i,j]), complying with the rules in Table 2, to calibrate the method for the groups in APD2 database (Wang & Wang, 2009).

Test
The testing plan had the following steps: calibrate PIM with the 228 MTVP proteins, (ii) test PIM with the four groups (Section: Trial Data Preparation) from the ADP2 Database, with a programed efficiency of 70%, (iii) test PIM with the four groups (Section: Trial Data Preparation) from the Uniprot Database, with a programed efficiency of 70%, (iv) compare PIM acceptance/rejection of the 228 MTVP proteins, to verify whether the functional groups (taken from the APD2 and Uniprot Databases) have influence in the calibration of the method and (v) test the PIM pattern obtained from the APD2 and Uniprot Databases with the four antigens of Mycobacterium tuberculosis (Section: Trial Data Preparation).

RESULTS
PIM was calibrated with the MTVP group from the Tuberculist Database (Lew et al., 2011) and was separately compared with the three classified functional groups: fungi, bacteria and viruses (Section: Trial Data Preparation) from the APD2 database, and the Uniprot Database.
The efficiency of the method for the functional groups from the APD2 Database was 161/228= 70%.In the case of Uniprot Database it was 157/228=69% (Table 4).PIM also excluded the remaining functional fungi, bacteria, and virus groups from the APD2 Database (Wang & Wang, 2009), and the Uniprot Database (Magrane & Uniprot, 2011).
The inflection points in the four groups from the Uniprot Database had different location on the x-axis (Fig. 1), though they were closer in the relevant groups of fungi and viruses, while the location of the same points in the groups from the APD2 Database was very different for all of them (Fig. 2).
PIM accepted the four antigens of Mycobacterium tuberculosis (Section: Trial Data Preparation) from the pattern of MTVP group in the Tuberculist Database (Lew et al., 2011) when applied to the APD2 Database; PIM also accepted the four antigens of Mycobacterium tuberculosis (Table 4), from the pattern of MTVP group in the Tuberculist Database (Lew et al., 2011) when applied to the Uniprot Database.

DISCUSSION
The discriminative efficiency of PIM in the identification of proteins associated with the MTVP group is high.It has been obtained with two slightly different patterns that identified the same group (70%).The method used is a supervised algorithm, therefore its calibration always depends on a training set.Even though the training set is the same for both databases, it was different from the PIM efficiency (hits/total) calibrated with the MTVP group from the Tuberculist Database (Lew et al., 2011) applied to the APD2 Database (Wang & Wang, 2009) and the Uniprot Database (Magrane & Uniprot, 2011).Antigens: Antigens of Mycobacterium tuberculosis (Section: Trial Data Preparation).other functional groups it was compared to i.e. fungi, viruses and bacteria.If we consider that the metric is based on polarity as a single discriminating property, we conjecture that this property is essential in the formation of proteins, and this conjecture was verified in this test, as the pattern of the method was not altered by the test files.
It is important to note that the Uniprot Database contains tens of thousands of proteins associated with different groups i.e. viruses, bacteria, and fungi among others; therefore, the use of PIM particularly on the fungi (47130 proteins) and virus (1104 proteins) groups, was fully automated.It is also relevant to mention that the method needed about 20 minutes to obtain the polar profile from the APD2 Database and 3 hours for the Uniprot Database.Once this was done, PIM only needed 3 seconds to identify the main association of the proteins to be analyzed in this study.Although the timing varies from one computer platform to another, it provides an estimate of its potential use on massive Databases.We are currently working on the implementation of a website to make this method accessible without requiring the source F77 program.This implementation will enable the analysis of up to 10 000 proteins in the FASTA format, and the users could receive the results by email.In a further work, the possibility to conduct a sub-classification of the virulence pathway from the Tuberculist Database will be contemplated that should allow us to deepen the knowledge on pathogenesis.
The static-dynamic profile deserves a separate consideration.The distribution of relative frequencies of the described groups indicates that the inflection points do not match in the groups.This has been already observed for other peptide and protein groups (Polanco et al., 2012;2013;2013a;2014;2014a;2014b;2014c;2014d;2014e).We assume that their location is the identifier underlying the measurement of the electromagnetic balance of a protein.However, we do not use this identification as an algorithm due to the difficulty of analytically building the smoothed curve.
As a result of the reported percentage of efficiency identifying proteins associated with the MTVP group (from the Tuberculist Database) in the APD2 Database and Uniprot Database, it is recommended to use the method described here as the "first filter" to identify these proteins.This method also underlines the importance of exclusively using polarity as the only physico-chemical property and to adopt matrix structures for this evaluation, as these algebraic structures provide more information about the phenomenon studied (Polanco et al., 2014d.

CONCLUSIONS
PIM is an effective, and totally automated algorithm that exhibits 70% efficiency in the identification of the protein group with action on MTVP in both databases and it has proven to be equally effective rejecting false positives from all peptides found in the main functional groups.

Figure 1 .
Figure 1.Polarity distribution comparison of the groups mentioned in the text, taken from the Uniprot Database (Magrane & Uniprot, 2011), and the MTVP group from the Tuberculist Database (Lew et al., 2011).The x-axis corresponds to the 16 polar interactions (Section: Trial Data Preparation).

Figure 2 .
Figure 2. Polarity distribution comparison of the groups mentioned in the text, taken from the APD2 Database (Wang & Wang, 2009), and the MTVP group from the Tuberculist Database (Lew et al., 2011).The x-axis corresponds to the 16 polar interactions (Section: Trial Data Preparation).