Skip to main content
Log in

Gene expression and protein–protein interaction data for identification of colon cancer related genes using f-information measures

  • Published:
Natural Computing Aims and scope Submit manuscript

Abstract

One of the most important and challenging problems in functional genomics is how to select the disease genes. In this regard, the paper presents a new computational method to identify disease genes. It judiciously integrates the information of gene expression profiles and shortest path analysis of protein–protein interaction networks. While the \(f\)-information based maximum relevance-maximum significance framework is used to select differentially expressed genes as disease genes using gene expression profiles, the functional protein association network is used to study the mechanism of diseases. An important finding is that some \(f\)-information measures are shown to be effective for selecting relevant and significant genes from microarray data. Extensive experimental study on colorectal cancer establishes the fact that the genes identified by the integrated method have more colorectal cancer genes than the genes identified from the gene expression profiles alone, irrespective of any gene selection algorithm. Also, these genes have greater functional similarity with the reported colorectal cancer genes than the genes identified from the gene expression profiles alone. The enrichment analysis of the obtained genes reveals to be associated with some of the important KEGG pathways. All these results indicate that the integrated method is quite promising and may become a useful tool for identifying disease genes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Althaus IW, Gonzales AJ, Chou JJ, Romero DL, Deibel MR, Chou KC, Kezdy FJ, Resnick L, Busso ME, So AG (1993) The quinoline U-78036 is a potent inhibitor of HIV-1 reverse transcriptase. J Biol Chem 268(20):14,875–14,880

    Google Scholar 

  • Altshuler D, Daly MJ, Lander ES (2008) Genetic mapping in human disease. Science 322(5903):881–888

    Article  Google Scholar 

  • Andraos J (2008) Kinetic plasticity and the determination of product ratios for kinetic schemes leading to multiple products without rate laws—new methods based on directed graphs. Can J Chem 86(4):342–357

    Article  Google Scholar 

  • Barrenas F, Chavali S, Holme P, Mobini R, Benson M (2009) Network properties of complex human disease genes Identified through genome-wide association studies. PLoS ONE 4(11):e8090

    Article  Google Scholar 

  • Benjamini Y, Yekutieli D (2001) The control of the false discovery rate in multiple testing under dependency. Ann Stat 29(4):1165–1188

    Article  MathSciNet  MATH  Google Scholar 

  • Bogdanov P, Singh A (2010) Molecular function prediction using neighborhood features. IEEE/ACM Trans Comput Biol Bioinf 7(2):208–217

    Article  Google Scholar 

  • Cai YD, Huang T, Feng KY, Hu L, Xie L (2010) A unified 35-gene signature for both subtype classification and survival prediction in diffuse large B-cell lymphomas. PLoS ONE 5(9):e12,726

    Article  Google Scholar 

  • Chen J, Aronow B, Jegga A (2009) Disease candidate gene identification and prioritization using protein interaction networks. BMC Bioinform 10(1):73

    Article  Google Scholar 

  • Chen L, Cai YD, Shi XH, Huang T (2010) Analysis of metabolic pathway using hybrid properties. PLoS ONE 5(6):e10,972

    Article  Google Scholar 

  • Chou KC (1990) Applications of graph theory to enzyme kinetics and protein folding kinetics: steady and non-steady-state systems. Biophys Chem 35(1):1–24

    Article  Google Scholar 

  • Chou KC (1993) Graphic rule for non-steady-state enzyme kinetics and protein folding kinetics. J Math Chem 12(1):97–108

    Article  Google Scholar 

  • Chou KC (2010) Graphic rule for drug metabolism systems. Curr Drug Metab 11:369–378

    Article  Google Scholar 

  • Chou KC, Forsen S (1980) Graphical rules for enzyme-catalysed rate laws. Biochem J 187:829–835

    Article  Google Scholar 

  • Chou KC, Kezdy FJ, Reusser F (1994) Kinetics of processive nucleic acid polymerases and nucleases. Anal Biochem 221(2):217–230

    Article  Google Scholar 

  • Dermitzakis ET (2008) From gene expression to disease risk. Nat Genet 40:492–493

    Article  Google Scholar 

  • Dijkstra EW (1959) A note on two problems in connexion with graphs. Numer Math 1(1):269–271

    Article  MathSciNet  MATH  Google Scholar 

  • Ding C, Peng H (2003) Minimum redundancy feature selection from microarray gene expression data. In: Proceedings of the international conference on computational systems bioinformatics, pp 523–528

  • Ding C, Peng H (2005) Minimum redundancy feature selection from microarray gene expression data. J Bioinform Comput Biol 3(2):185–205

    Article  MathSciNet  Google Scholar 

  • Duda RO, Hart PE, Stork DG (1999) Pattern classification and scene analysis. Wiley, New York

    Google Scholar 

  • Goh KI, Cusick ME, Valle D, Childs B, Vidal M, Barabsi AL (2007) The human disease network. Proc Natl Acad Sci USA 104(21):8685–8690

    Article  Google Scholar 

  • Hinoue T, Weisenberger DJ, Lange CP, Shen H, Byun HM, Van Den Berg D, Malik S, Pan F, Noushmehr H, van Dijk CM, Tollenaar RAEM, Laird PW (2012) Genome-scale analysis of aberrant DNA methylation in colorectal cancer. Genome Res 22(2):271–282

    Article  Google Scholar 

  • Huang DW, Sherman BT, Lempicki RA (2009a) Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc 4(1):44–57

    Article  Google Scholar 

  • Huang T, Cui W, Hu L, Feng K, Li YX, Cai YD (2009b) Prediction of pharmacological and xenobiotic responses to drugs based on time course gene expression profiles. PLoS ONE 4(12):e8126

    Article  Google Scholar 

  • Huang T, Cai YD, Chen L, Hu LL, Kong XY, Li YX, Chou KC (2010a) Selection of reprogramming factors of induced pluripotent stem cells based on the protein interaction network and functional profiles. PLoS ONE 5(9):e12,726

    Article  Google Scholar 

  • Huang T, Shi XH, Wang P, He Z, Feng KY, Hu L, Kong X, Li YX, Cai YD, Chou KC (2010b) Analysis and prediction of the metabolic stability of proteins based on their sequential features, subcellular locations and interaction networks. PLoS ONE 5(6):e10,972

    Article  Google Scholar 

  • Huang T, Chen L, Cai YD, Chou KC (2011) Classification and analysis of regulatory pathways using graph property, biochemical and physicochemical property, and functional property. PLoS ONE 6(9):e25,297

    Article  Google Scholar 

  • Huret JL, Dessen P, Bernheim A (2003) Atlas of genetics and cytogenetics in oncology and haematology, year 2003. Nucl Acids Res 31(1):272–274

    Article  Google Scholar 

  • Jia P, Zheng S, Long J, Zheng W, Zhao Z (2011) dmGWAS: dense module searching for genome-wide association studies in protein–protein interaction networks. Bioinformatics 27(1):95–102

    Article  Google Scholar 

  • Karaoz U, Murali TM, Letovsky S, Zheng Y, Ding C, Cantor CR, Kasif S (2004) Whole-genome annotation by using evidence integration in functional-linkage networks. Proc Natl Acad Sci USA 101(9):2888–2893

    Article  Google Scholar 

  • Karni S, Soreq H, Sharan R (2009) A network-based method for predicting disease-causing genes. J Comput Biol 16(2):181–189

    Article  Google Scholar 

  • Keshava Prasad TS, Goel R, Kandasamy K, Keerthikumar S, Kumar S, Mathivanan S, Telikicherla D, Raju R, Shafreen B, Venugopal A, Balakrishnan L, Marimuthu A, Banerjee S, Somanathan DS, Sebastian A, Rani S, Ray S, Kanth S, Ahmed M, Kashyap MK, Mohmood R, Ramachandra YL, Krishna V, Rahiman BA, Mohan S, Ranganathan P, Ramabadran S, Chaerkady R, Pandey A (2009) Human protein reference database-2009 update. Nucleic Acids Res 37(suppl 1):D767–D772

    Article  Google Scholar 

  • Kohler S, Bauer S, Horn D, Robinson PN (2008) Walking the interactome for prioritization of candidate disease genes. Am J Hum Genet 82(4):949–958

    Article  Google Scholar 

  • Kourmpetis YAI, van Dijk ADJ, Bink MCAM, van Ham RCHJ, ter Braak CJF (2010) Bayesian Markov random field analysis for protein function prediction based on network data. PLoS ONE 5(2):e9293

    Article  Google Scholar 

  • Letovsky S, Kasif S (2003) Predicting protein function from protein/protein interaction data: a probabilistic approach. Bioinformatics 19(suppl 1):i197–i204

    Article  Google Scholar 

  • Li BQ, Huang T, Liu L, Cai YD, Chou KC (2012) Identification of colorectal cancer related genes with mRMR and shortest path in protein–protein interaction network. PLoS ONE 7(4):e33,393

    Article  Google Scholar 

  • Li Y, Li J (2012) Disease gene identification by random walk on multigraphs merging heterogeneous genomic and phenotype data. BMC Genom 13(Suppl 7):S27

    Article  Google Scholar 

  • Liu X, Krishnan A, Mondry A (2005) An entropy based gene selection method for cancer classification using microarray data. BMC Bioinform 6(1):76

    Article  Google Scholar 

  • Maji P (2009) \(f\)-information measures for efficient selection of discriminative genes from microarray data. IEEE Trans Biomed Eng 56(4):1063–1069

    Article  MathSciNet  Google Scholar 

  • Maji P (2012) Mutual information based supervised attribute clustering for microarray sample classification. IEEE Trans Knowl Data Eng 24(1):127–140

    Article  Google Scholar 

  • Maji P, Paul S (2011) Rough set based maximum relevance-maximum significance criterion and gene selection from microarray data. Int J Approx Reason 52(3):408–426

    Article  Google Scholar 

  • Maji P, Paul S (2014) Scalable pattern recognition algorithms: applications in computational biology and bioinformatics. Springer, London

    Book  MATH  Google Scholar 

  • Meltzer PS (2001) Spotting the target: microarrays for disease gene discovery. Curr Opin Genet Dev 11(3):258–263

    Article  Google Scholar 

  • Mohammadi A, Saraee M, Salehi M (2011) Identification of disease-causing genes using microarray data mining and gene ontology. BMC Med Genomics 4(1):12

    Article  Google Scholar 

  • Nagaraj S, Reverter A (2011) A Boolean-based systems biology approach to predict novel genes associated with cancer: application to colorectal cancer. BMC Syst Biol 5(1):35

    Article  Google Scholar 

  • Navlakha S, Kingsford C (2010) The power of protein interaction networks for associating genes with diseases. Bioinformatics 26(8):1057–1063

    Article  Google Scholar 

  • Ng KL, Ciou JS, Huang CH (2010) Prediction of protein functions based on function–function correlation relations. Comput Biol Med 40(3):300–305

    Article  Google Scholar 

  • Nitsch D, Tranchevent LC, Thienpont B, Thorrez L, Van Esch H, Devriendt K, Moreau Y (2009) Network analysis of differential expression for the identification of disease-causing genes. PLoS ONE 4(5):e5526

    Article  Google Scholar 

  • Novershtern N, Itzhaki Z, Manor O, Friedman N, Kaminski N (2008) A functional and regulatory map of asthma. Am J Respir Cell Mol Biol 38(3):324–336

    Article  Google Scholar 

  • Oti M, Snel B, Huynen MA, Brunner HG (2006) Predicting disease genes using protein–protein interactions. J Med Genet 43(8):691–698

    Article  Google Scholar 

  • Paul S, Maji P (2013a) \(\mu \)HEM for identification of differentially expressed miRNAs using hypercuboid equivalence partition matrix. BMC Bioinform 14(1):266

    Article  MathSciNet  Google Scholar 

  • Paul S, Maji P (2013b) Rough sets for insilico identification of differentially expressed miRNAs. Int J Nanomed 8:63–74

    Google Scholar 

  • Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8):1226–1238

    Article  Google Scholar 

  • Pluim JPW, Maintz JBA, Viergever MA (2004) \(f\)-Information measures in medical image registration. IEEE Trans Med Imaging 23(12):1508–1516

    Article  Google Scholar 

  • Quenouille MH (1949) Approximate tests of correlation in time-series. J R Stat Soc Ser B Methodol 11(1):68–84

    MathSciNet  MATH  Google Scholar 

  • Ruan X, Wang J, Li H, Perozzi RE, Perozzi EF (2008) The use of logic relationships to model colon cancer gene expression networks with mRNA microarray data. J Biomed Inform 41(4):530–543

    Article  Google Scholar 

  • Sabates-Bellver J, Van der Flier LG, de Palo M, Cattaneo E, Maake C, Rehrauer H, Laczko E, Kurowski MA, Bujnicki JM, Menigatti M, Luz J, Ranalli TV, Gomes V, Pastorelli A, Faggiani R, Anti M, Jiricny J, Clevers H, Marra G (2007) Transcriptome profile of human colorectal adenomas. Mol Cancer Res 5(12):1263–1275

    Article  Google Scholar 

  • Sharan R, Ulitsky I, Shamir R (2007) Network-based prediction of protein function. Mol Syst Biol 3(88):1–13

    Google Scholar 

  • Szklarczyk D, Franceschini A, Kuhn M, Simonovic M, Roth A, Minguez P, Doerks T, Stark M, Muller J, Bork P, Jensen LJ (2011) The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored. Nucl Acids Res 39(suppl 1):D561–D568

    Article  Google Scholar 

  • Vajda I (1989) Theory of statistical inference and information. Kluwer, Dordrecht

    MATH  Google Scholar 

  • Wu C, Zhu J, Zhang X (2012) Integrating gene expression and protein–protein interaction network to prioritize cancer-associated genes. BMC Bioinform 13(1):182

    Article  MathSciNet  Google Scholar 

  • Zhao J, Jiang P, Zhang W (2010) Molecular networks for the study of TCM pharmacology. Brief Bioinform 11(4):417–430

    Article  Google Scholar 

  • Zhao J, Yang TH, Huang Y, Holme P (2011) Ranking candidate disease genes from gene expression and protein interaction: a Katz-centrality based approach. PLoS ONE 6(9):e24,306

    Article  Google Scholar 

  • Zhou GP (2011) The disposition of The LZCC protein residues in wenxiang diagram provides new insights into the protein–protein interaction mechanism. J Theor Biol 284(1):142–148

    Article  Google Scholar 

  • Zhou GP, Deng MH (1984) An extension of Chou’s graphic rules for deriving enzyme kinetic equations to systems involving parallel reaction pathways. Biochem J 222:169–176

    Article  Google Scholar 

Download references

Acknowledgments

The work was done when one of the authors, S. Paul, was a Visiting Scientist of Indian Statistical Institute, Kolkata.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sushmita Paul.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Paul, S., Maji, P. Gene expression and protein–protein interaction data for identification of colon cancer related genes using f-information measures. Nat Comput 15, 449–463 (2016). https://doi.org/10.1007/s11047-015-9485-6

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11047-015-9485-6

Keywords

Navigation