Computer Science & Systems Evaluation of the Efficacy of Cancer Drugs by Using the Second Largest Eigenvalue of Metabolic Cancer Pathways

Cancer is a system with thousands of genes and proteins with the complex interactions between them. By examining the cancer drug activity on only part of this system, we do not know in which direction the whole system will evolve, and whether therapy will be useful or not. This is one of the main reasons why cancer therapies still do not meet our expectations. In order to find more effective anticancer therapies, it is important to consider the impact of drugs on the entire cancer system. The second largest eigenvalue plays a key role in complex systems The algorithms the second largest eigenvalue of graphs have been already used to speed up processes in computer networks and differential cryptanalysis. Based on the aforementioned, it could be assumed that maximizing the second highest eigenvalue could slow down the processes in metabolic networks that describe processes in cancer. To verify our hypothesis, we have built the in silico model of cancer Vini and run it on a supercomputer. Vini transformed the metabolic pathways of cancer from Kyoto Encyclopedia of Genes and Genomes into the binding energy matrices representing binding energies between the genes and proteins on one side and drugs being investigated on another side. Some matrix elements also represent interactions between proteins and genes. Then, it calculated the second largest eigenvalues of these matrices. In the end, we compared the calculated results against the existing in vitro and in vivo experimental results. The calculated efficacy of cancer drugs was confirmed in 79.31% of in vivo experimental cases, and in 92.30% of in vitro experimental cases. These results show that the second largest eigenvalue plays an important role in metabolic cancer networks and that the Vini model can be an effective aid in finding more effective cancer therapies.


Introduction
Despite the major improvements in the effectiveness of cancer therapy over the last decades, cancer remains one of the leading causes of mortality in the world. Cancer was responsible for 8.8 million deaths worldwide in 2012 [1] and according to World Health Organization cancer fact sheets, with approximately 14 million new cases every year, the second leading cause of death globally. In addition, an increase in the number of new cases from 14.1 million in 2012 to 21.6 million in 2030 is expected. According to the US National Cancer Institute, a 5-year survival rates for patients with metastatic stage IV colon cancer, rectal cancer, kidney and pancreatic cancer, small cell lung cancer (SCLC), non-small cell lung cancer (NSCLC) and hepatocellular carcinoma are 11, 12, 8, 1, 2, 1 and 3 percent respectively, giving an average 5-year survival rate of 5.42 percent. This means that out of 1,000 patients diagnosed with one of these types of cancer in stage IV, and after the period of 5 years, about five of them will still be alive. Even if we take into account that some of these patients will die from some other diseases not directly related to cancer, the five-year average survival rate is weak and unsatisfactory. However, with the emergence of supercomputers, we can analyze in more depth the natural laws of life on Earth, including the metabolic processes regarding cancer. We can analyze data obtained from high-throughput genomic and proteomic tools, solve systems with millions of linear equations, and analyze graphs that represent thousands of genes and proteins. The better we understand life, the more chances we have of finding more effective cancer therapies. Efficient processing of huge amount of data condensed in human genome, with the goal to find the optimal cancer therapy, lies at the heart of our in silico model of cancer Vini. The Vini model was developed as a parallel application for supercomputers. The need for supercomputing power lies in the complexity of cancer disease, where thousands of molecules are organized in multi-molecular complexes and interact with one another. In addition, these complexes interact with one another, leading to exponentially increased complexity [2].
There are several approaches that help us understand the complexity of cancer and the processes from one genetically mutated cell to the last metastatic cancer phase. The chaos theory, which is the theory of highly non-linear dynamical systems, is trying to find the laws to model this complexity [3], either by setting systems of nonlinear equations describing cancer behavior or by investigating cancer mechanisms by means of strange attractors [4]. In contrary to the chaos theory, omics approaches gather huge amounts of data generated by changes of all genes and proteins, trying to find functional entities within the complex cancer organization [5]. System biology is different from the omics approach and from the chaos theory. It tries to disclose the molecular structures of individual genes and proteins, and then integrate them into larger information structures [6]. These larger information structures are organized into databases focusing on cancer-related genes such as COSMIC [7], or focusing on cancer drugs such as Cancer Cell Line Encyclopedia [8]. They are analyzed using methods based on known pathways such as PathScan [9] and Netbox [10], or using networks describing metabolic pathways such as those from Kyoto Encyclopedia of Genes and Genomes (KEGG) database [11], and MetaCyc [12], which contain proteins, chemical compounds, and relationships between these entities.
The Vini model approaches cancer in the spirit of system biology, and is oriented towards analysis of metabolic pathways. It was tested with the data from KEGG database, which, along with Reactome [13], is considered the most reliable source of experimentally confirmed metabolic reactions. Systematic analysis [14] revealed 15161 compounds in KEGG, 14621 of them with structure, and with the mean associated pathway per compound of 0.67. Since Vini is an in silico model of cancer [15], it only analyzes cancer pathways, in this case KEGG cancer pathways. They contain more than 600 compounds of which about 400 have a structure. KEGG cancer pathways are network diagrams describing the metabolism of 17 specific cancer types: gastric cancer, colorectal cancer, pancreatic cancer, hepatocellular carcinoma, endometrial cancer, breast cancer, prostate cancer, thyroid cancer, bladder cancer, acute and chronic myeloid leukemia, melanoma, basal cell carcinoma, small cell lung cancer, non-small cell lung cancer, renal cell carcinoma, and glioma. There is additional cancer pathway providing the general overview of cancer metabolism, and besides some of the main hallmarks of cancers [16] like genomic instability, proliferation, insensitivity to anti-growth signals, evading apoptosis sustained angiogenesis and tissue invasion and metastasis, it describes additional cancer mechanisms like mitochondrion, immortality, resistance to chemotherapy, block of differentiation, microtubule, fumarate to S-malate conversion, and genomic damage. Typical KEGG cancer pathway elements are rectangles representing genes, circles representing chemical compounds and glycans, and lines representing reactions and interactions. Specific boxes may exist providing additional information (Figure 1).
Besides homo sapiens cancer pathways, KEGG cancer pathways are available for more than fifty other species, e.g., Canis familiaris, Felix domesticus, Mus musculus, Gorilla gorilla etc.
There are various ways of how in silico models of cancer complement in vitro and in vivo experiments, thus helping us to find better and more effective cancer therapies and to set better diagnosis. These models analyze cancer processes either at higher scale, like tumor extracellular matrix [17], or at the molecular level [18]. The cancer analysis can be performed with statistically based graphing models developed on genomic [19], transcriptional [20] and trajectory levels [21], statistically derived network models [22] trying to detect the basic distribution of probability from data samples and metabolic models that use ordinary [23] or partial differential equations [24]. A comprehensive overview of the various statistical and mathematical methods used in various in silico models of cancer can be found [25].
Although in silico models of cancer are already valuable tools for the detection of new drugs [26] and the establishment of more precise cancer diagnoses [27], they are still not accurate enough to be implemented in clinical practice. Increasing the in silico model accuracy is one of the most important goals, and this can be accomplished by using larger and more complete datasets, by personalized approach to each cancer patient, and by using well developed mathematical algorithms which solve large optimization problems. Optimal solution means more precise diagnosis and more effective therapy. Some actions in this direction have already been made. Non-convex  By multiplying the matrix A with the structure pdb(l), original matrix is transformed to the matrix A ' : Let's define operator V(A') acting on the main-diagonal elements of A': As the next, let's node 1 connects to nodes 2 and 3 via added edges with weights rel (12) and rel (13). In addition, let's nodes 2 and 3 connect to node 4 via edges with weights rel (24) and rel(34) ( Figure 3).
Then, the adjacency matrix A'' of this graph will be: As the next, assume that nodes are genes in a certain KEGG cancer pathway, and edges are interactions between them. pdb(r1) to pdb(r4) are molecular structure files of these genes in pdb format from Protein Data Bank [35]. pdb(l) is molecular structure file of cancer drug in pdb format. rel (12), rel (13), rel (2,4) and rel (3,4) are interactions between genes. Operator V is a program for molecular docking computing the binding energies e1, e2, e3, e4 between genes and chemical compound. Then, matrix A'' transforms into a new matrix. We call this matrix binding energy matrix, abbreviated BE. For this simple example, the BE matrix will be: Enabling more effective cancer therapies by using eigenvalue optimization [33] is the main goal of this study and lies at the heart of the in silico model of cancer Vini. It is to be expected that by increasing the second largest eigenvalue (in further text abbreviated SLEM) of the network describing the metabolic pathway of cancer the carcinogenesis process will slow down and perhaps even completely stop, which will be discussed in the next section. Since the algorithm for maximizing the SLEM of a certain network which describes the metabolic pathway of cancer is unknown, Vini uses supercomputing power and seeks for the solution with the highest SLEM value in the space of all available solutions. This space is defined as a set of matrices whose elements represent the binding energies between the genes and proteins defined by the metabolic pathway of cancer and drugs whose efficacy is being investigated.

The Model
Network diagrams are graphical presentations of networks. Network theory is a part of graph theory, and the network can be viewed as a graph in which nodes have attributes [34]. In that sense, KEGG network diagrams are graphs with nodes representing the names of genes, proteins, and compounds. Besides, KEGG networks describe the various types of relationships between genes, proteins, and compounds. Therefore they can be also viewed as weighted and directed graphs, that is, graphs that have some values associated with directed edges. In computer science and other applications, these values are usually real numbers, but may also be complex numbers, lists, structures, and so on. We may here recall that in the graph theory, a loop is defined as an edge that connects the node to itself. This allows us to define fully weighted graphs, with values assigned both to nodes and edges.

Definition 1
Fully weighted graphs are graphs with weighted edges and weighted loops. Edges and loops in fully weighted graphs may have directions. This allows us to define fully weighted and directed graphs.

Definition 2
Fully weighted and directed graphs are graphs with weighted and directed edges and weighted and directed loops.
In case of metabolic networks, fully weighted and directed graphs allow allocating molecular structures of genes, proteins and compounds to graph nodes, and relationships to graph edges. Matrices representing such graphs can be constructed, and algebraic operations performed on these matrices. For example, consider the fully weighted and directed graph, each node with one loop. Weights are molecular structures pdb(r1), pdb(r2), pdb(r3) and pdb(r4), associated to loops on nodes 1, 2, 3 and 4, as shown the next (Figure 2).
Adjacency matrices are used for the algebraic representation of graphs. The adjacency matrix A for this graph is: representing genes, proteins, RNAs, chemical compounds, glycans and chemical reactions. Based on additional information on chemical compounds whose anti-cancer effectiveness needs to be estimated. Vini transforms cancer pathways into binding energy matrices and after that calculates SLEM values of these matrices. Binding energy matrices describe the interaction of cancer with chemical compounds, such as with approved cancer drugs, anticancer herbal substances, vitamins, dietary supplements, opioids, anti-inflammatory and other drugs sometimes accompanying the cancer therapy. We assumed that the SLEM values of these matrices are related to the rate of development of the cancer processes defined by these metabolic pathways, with the higher SLEM value expressing the slower rate and vice versa. In order to check our assumption, we let Vini to process 17 KEGG cancer pathways against the number of chemical compounds, to create binding energy matrices, and finally to calculate their SLEM values. The calculation was performed on 600 Intel core processors and lasted for about 160 hours.
Besides, Table S1 in the supplementary information lists the SLEM values for all KEGG cancer pathways and chemical compounds investigated. Evaluation of the model was performed by comparing the SLEM values with the cancer drugs efficiency results obtained from the existing in vivo and in vitro experiments. The results of the evaluation against in vivo experimental results are given in Table 3.
The fields in the table for cancer drugs with no in vivo reference are labeled in yellow. The numbers in the table fields are references  of the relevant studies. The probability that Vini will correctly predict the efficiency of a cancer drug against a specific type of cancer is defined as the ratio of the number of green fields to the sum of green and violet fields and is 0.793. The abbreviations: PTX-paclitaxel, DTX -docetaxel, VNB-vinorelbine tartrate, TRP-triptorelin pamoate, EVEeverolimus, EBU-eribulin mesylate, IDA-idarubicin hydrochloride, MDT-midostaurin, TES-temsirolimus, RCC-renal cell carcinoma, HCC-hepatocellular carcinoma, BCC-basal cell carcinoma, CMLchronic myeloid leukemia, AML-acute myeloid leukemia, SCLCsmall cell lung cancer, non-small cell lung cancer. The short description of each study, together with the results and conclusion from the study, is in Table S2 of supplementary information.
The accuracy of the Vini model in predicting in vitro efficacy of cancer drugs was estimated by comparing the SLEM values with data from the NCI-60 database. This database contains data about BE matrices are of the much bigger size than the matrix in this example. Their size depends on cancer type and is in the range from several thousand to several tens of thousands of elements, with dozens of interactions between them. Consequently, screening for the most efficient drug candidates against the cancer is computationally expensive. It consists of two stages. In the first stage, Vini downloaded KEGG cancer pathway structure from Genome Net, a Japanese network of database and computational services for genome research and related research areas in biomedical sciences. The structure is then parsed for genes, proteins, compounds and relation entries. Then, the adjacency matrix is created according to the procedure already explained through (1)- (5). V operator is Auto dock Vina program for molecular docking [36], which calculates e1, e2, . . . , en binding energy levels between genes and drugs. For each drug, one BE matrix is created, and the process is repeated until all available drugs are screened. In the second stage, second largest eigenvalues of these matrices are computed.
Our thesis is that the drug with the highest second largest eigenvalue has the highest anticancer activity. There are several reasons for such thinking. In 1973, Fiedler [37] and Donath and Hoffmann [38] pointed to the possibility of using eigenvalues for graph partitioning. According to Tomic [39], the SLEM was reported as the measure of the execution speed of parallel processing systems. Boyd et al. [40] developed the algorithm for minimizing the SLEM in stochastic networks. With some modifications, this algorithm was successfully implemented for the optimization of High Performance Linpack benchmark [41] on supercomputer [42]. Consequently, since a lower SLEM value ensures the faster run in computer networks, it is expected that the cancer drugs with higher SLEM values will more effectively slow down carcinogenesis in cancer networks.
Vini is written in Linux shell scripting language and Python, and uses Auto dock Vina for computing molecular binding energies. As the calculation of binding energies and the creation of binding energy matrices is computationally intensive, Vini is targeted to run on the supercomputer. It is an open source code and is available from the authors of this research on the request. However, the usage of metabolic cancer pathways generally requires the proper licensing from the providers of these pathways. in vitro anti-cancer efficacy for more than 100,000 compounds and 50,000 natural products extracts against 60 types of tumor cells, which represent nine types of cancer tissues [100]. In the NCI-60 database, the efficacy of a cancer drug is defined as the logarithm of a drug concentration which by 50% inhibits cancer cell division. The drug with the highest SLEM value for the specific KEGG cancer pathways was validated in the NCI-60 database considering its efficacy against the corresponding tumor cell type. If that drug is not found in the NCI-60 database, the search is repeated for the drug with the next highest SLEM value and so on, until the five drugs with the highest SLEM values were found. Similarly, the drug with the lowest SLEM value for the specific KEGG cancer pathway is searched in the NCI-60 database against the same type of cancer modeled with that specific pathway. If this drug is not found in the NCI-60 database, the search is repeated for the drug with the next lowest SLEM value and so on, until the five drugs with the lowest SLEM values were found. The search is repeated for 9 KEGG cancer pathways for which NCI-60 has data on the efficacy of cancer drugs against corresponding cancer cells, including cells of colon cancer, renal cell carcinoma, glioma, prostate cancer, melanoma, chronic myeloid leukemia, acute myeloid leukemia, non-small cell lung cancer and breast cancer. For example, in vitro activity of five cancer drugs with the highest SLEM values and five cancer drugs with the highest SLEM values against the KEGG prostate cancer pathway is shown in Table 4.

KEGG cancer pathways are network diagrams with various entities
Besides, Table S3 in the supplementary information lists the drugs with the highest and the lowest SLEM values and their logarithmic (-log 4 ) concentrations that inhibit cell division by 50% for all 9 cancer tissues in NCI-60 database.
The probability that Vini will accurately identify the drugs according to their in vitro activity depends on the cancer type and is shown in Table 5.

Conclusions
In silico model of cancer Vini was developed on the hypothesis that there is a correlation between the anti-cancer activity of chemical compounds and SLEM values of binding energy matrices. Thereby, binding energy matrices represent binding energies between the genes and proteins on one side and drugs being investigated on another side.  To confirm this, we used Vini to create 3417 matrices that describe the interactions of 132 cancer drugs and 69 other chemical compounds with 17 KEGG cancer pathways and to calculate their SLEM values.
Calculated SLEM values were compared with the logarithmic concentration values of cancer drugs in NCI-60 database that inhibit the cancer cells division by 50%, and with the results of relevant in vivo experiments. The correlation of the SLEM values with the data from the NCI-60 base was found in 92.30% of cases, while the correlation with the data from in vivo experiments was found in 79.31% of cases.
The difference in predicting in vivo and in vitro drug efficacy is inherent to the model. Vini acts at the genomic and proteomic level defined by the metabolic pathways of cancer. Therefore, it does not take into the account effects of tumor stroma [101], extracellular matrix [102], pharmacokinetics [103], and toxicity [104] of cancer drugs.
Other well-developed computer models like Swiss ADME [105] can be used to bridge this gap. Besides, as Vini calculates the binding energies between a certain drug and only one domain of each gene in metabolic pathways, there is a certain loss of information. In addition, Auto dock Vina performs well molecular docking calculations of drugs with molar masses of up to several hundred Daltons but is not suitable for the drugs with higher molar masses. Therefore, further improvement in the accuracy and the functionality of Vini can be obtained by letting Vini compute the binding energies of drugs across several domains of genes, and by integrating additional molecular docking tools able to work with large molecule drugs [106]. Besides KEGG, other metabolic pathway databases like Reactome [107] can be used. Thus there is a chance to further increase the functionality of Vini and accuracy of drug efficacy prediction by combining two or more metabolic pathways.
Vini already points out that some drugs not approved for certain types of cancer are effective but in vivo and in vitro experiments Table 3: The list of cancer drugs with the five largest SLEM values for the specific KEGG cancer pathways. The only exception to this "rule five" is in case of melanoma, where eribulin on the 5th and vinorelbine on the 6th place has the same SLEM value. If the efficiency of a drug is confirmed by the existing in vivo testing, the appropriate field of the table is labeled green. Otherwise it is labeled purple. The fields in the table for cancer drugs with no in vivo reference are labeled in yellow. The numbers in the table fields are references  of the relevant studies. The probability that Vini will correctly predict the efficiency of a cancer drug against a specific type of cancer is defined as the ratio of the number of green fields to the sum of green and violet fields and is 0.793. The abbreviations: PTX-paclitaxel, DTX-docetaxel, VNB-vinorelbine tartrate, TRP-triptorelin pamoate, EVE-everolimus, EBU-eribulin mesylate, IDA-idarubicin hydrochloride, MDT-midostaurin, TES-temsirolimus, RCC-renal cell carcinoma, HCC-hepatocellular carcinoma, BCC-basal cell carcinoma, CML-chronic myeloid leukemia, AML-acute myeloid leukemia, SCLC-small cell lung cancer, non-small cell lung cancer. The short description of each study, together with the results and conclusion from the study, is in the S2  are missing to confirm this. A good example for this is triptorelin pamoate, approved only for the therapy of prostate cancer. However, Vini declares it as effective in colorectal cancer, melanoma, and acute myeloid leukemia. There are also other chemical compounds calculated by Vini as effective against various cancer types, like several herbal compounds and high molecular weight hyaluronic acid [108]. Additionally, these substances have relatively low toxicity, and there is a chance they may work in synergy with the approved cancer drugs. That reinforces our opinion that Vini is an effective aid in the process of screening new drug candidates for cancer therapy.
Likewise, it's presumable that the results are applicable also in veterinary oncology, which is to be more profoundly investigated in the future.
One of the major problems in cancer therapy is frequent appearance of chemoresistence in case of chemotherapy with only one drug. That is why our future research will focus on the evaluation of the effectiveness of combinations of two or more drugs. Another problem is the resistance of some types of cancer to radiotherapy. Therefore, in our future research, a special emphasis will be put on extending the functionality of the Vini model in this direction.   Table 5: Lists the probabilities of the correct differentiation of in vitro activity of the drugs against various NCI-60 cell lines. The mean probability for nine NCI-60 cell lines is 0.92. These results confirm the high accuracy of Vini in differentiating between the drugs with strong and weak in vitro activity against specific cancer types. Table S4 in the supporting information lists the drugs with the highest and the lowest SLEM values and their corresponding inhibitory values for all 9 cancer tissues in NCI-60 database. The abbreviations: RCC -renal cell carcinoma, CML -chronic myeloid leukemia, ACL -acute myeloid leukemia, NSCLC -non-small cell lung cancer.