Abstract
Binary Matrix Factorization can be used at the core of many data analysis pipelines. It is used for clustering items, categorical characteristics of observations, and recommendation systems for users interacting with itemsets. The most common algorithms approximate the factorization through gradient descent. However, the results are approximately binary. When thresholded, the reconstruction error is so high that the matrices are no longer representative of the original. Therefore, the analyst must always choose between precision and explainability. We achieved theoretical results that greatly improve solving the exact subproblem of this factorization. These results enable a backtracking approach that can solve the linearized formulation of the subproblem in large binary matrices taking advantage of their sparsity in real settings. Finally, we test this new approach post-processing matrices yielded by gradient descent algorithms using the new backtracking to obtain actually binary factorized matrices with a diminished reconstruction error, close the level of what gradient descent is capable of finding. We tested our algorithm using gene expression datasets, and could find a error rate comparable to the relaxed continuous problem before discretization. The discretized matrices allow for domain experts to question biclusters of gene-expressions and samples taken.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Barrett, T., et al.: NCBI GEO: archive for functional genomics data sets-update. Nucleic Acids Res. 41(D1), D991–D995 (2012)
Chandran, U.R., et al.: Gene expression profiles of prostate cancer reveal involvement of multiple molecular pathways in the metastatic process. BMC Cancer 7(1), 1–21 (2007)
Creighton, C., Hanash, S.: Mining gene expression databases for association rules. Bioinformatics 19(1), 79–86 (2003)
Dillman, J.F., et al.: Genomic analysis of rodent pulmonary tissue following Bis-(2-chloroethyl) sulfide exposure. Chem. Res. Toxicol. 18(1), 28–34 (2005)
Edgar, R., Domrachev, M., Lash, A.E.: Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 30(1), 207–210 (2002)
Hinze, C.H., et al.: Immature cell populations and an erythropoiesis gene-expression signature in systemic juvenile idiopathic arthritis: implications for pathogenesis. Arthritis Res. Ther. 12, 1–13 (2010)
Iwakawa, M., et al.: The radiation-induced cell-death signaling pathway is activated by concurrent use of cisplatin in sequential biopsy specimens from patients with cervical cancer. Cancer Biol. Ther. 6(6), 905–911 (2007)
Kumar, R., Panigrahy, R., Rahimi, A., Woodruff, D.: Faster algorithms for binary matrix factorization. In: International Conference on Machine Learning, pp. 3551–3559. PMLR (2019)
LaBreche, H.G., Nevins, J.R., Huang, E.: Integrating factor analysis and a transgenic mouse model to reveal a peripheral blood predictor of breast tumors. BMC Med. Genomics 4(1), 1–14 (2011)
Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999)
Liu, Y.C., Cheng, C.P., Tseng, V.S.: Mining differential top-k co-expression patterns from time course comparative gene expression datasets. BMC Bioinformatics 14, 1–13 (2013)
Metzeler, K.H., et al.: An 86-probe-set gene-expression signature predicts survival in cytogenetically normal acute myeloid leukemia. Blood J. Am. Soc. Hematol. 112(10), 4193–4201 (2008)
Meyer, L.H., et al.: Early relapse in all is identified by time to leukemia in NOD/SCID mice and is characterized by a gene signature involving survival pathways. Cancer Cell 19(2), 206–217 (2011)
Miettinen, P., Mielikäinen, T., Gionis, A., Das, G., Mannila, H.: The discrete basis problem. IEEE Trans. Knowl. Data Eng. 20(10), 1348–1362 (2008)
Mirisaee, H., Gaussier, E., Termier, A.: Efficient local search for L1 and L2 binary matrix factorization. Intell. Data Anal. 783–807 (2016)
Monks, S., et al.: Genetic inheritance of gene expression in human cell lines. Am. J. Hum. Genet. 75(6), 1094–1105 (2004)
Morse, C.G., et al.: HIV infection and antiretroviral therapy have divergent effects on mitochondria in adipose tissue. J. Infect. Dis. 205(12), 1778–1787 (2012)
Nair, R.P., et al.: Genome-wide scan reveals association of psoriasis with IL-23 and NF-\(\kappa \)B pathways. Nat. Genet. 41(2), 199–204 (2009)
Noble, C.L., et al.: Regional variation in gene expression in the healthy colon is dysregulated in ulcerative colitis. Gut 57(10), 1398–1405 (2008)
Parnell, G.P., et al.: Identifying key regulatory genes in the whole blood of septic patients to monitor underlying immune dysfunctions. Shock 40(3), 166–174 (2013)
Pellagatti, A., et al.: Deregulated gene expression pathways in myelodysplastic syndrome hematopoietic stem cells. Leukemia 24(4), 756–764 (2010)
Pfister, T.D., et al.: Topoisomerase i levels in the NCI-60 cancer cell line panel determined by validated ELISA and microarray analysis and correlation with indenoisoquinoline sensitivity. Mol. Cancer Ther. 8(7), 1878–1884 (2009)
Prat, A., et al.: Research-based PAM50 subtype predictor identifies higher responses and improved survival outcomes in HER2-positive breast cancer in the NOAH study. Clin. Cancer Res. 20(2), 511–521 (2014)
Schena, M., Shalon, D., Davis, R.W., Brown, P.O.: Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270(5235), 467–470 (1995)
Schirmer, S.H., et al.: Suppression of inflammatory signaling in monocytes from patients with coronary artery disease. J. Mol. Cell. Cardiol. 46(2), 177–185 (2009)
Shapira, S.D., et al.: A physical and regulatory map of host-influenza interactions reveals pathways in H1N1 infection. Cell 139(7), 1255–1267 (2009)
SnÃÅel, V., PlatoÅ, J., KrÃmer, P.: Developing genetic algorithms for boolean matrix factorization. In: CEUR Workshop Proceedings, pp. 61–70. CEUR-WS (2008)
SnÃÅel, V., PlatoÅ, J., KrÃmer, P., HÃsek, D., Frolov, A.: On the road to genetic boolean matrix factorization. Neural Netw. World 17, 675–688 (2007)
Spira, A., et al.: Airway epithelial gene expression in the diagnostic evaluation of smokers with suspect lung cancer. Nat. Med. 13(3), 361–366 (2007)
Spyrides, G., Poggi, M., Lopes, H.: Towards efficient searches for the discrete basis problem. Technical report, 02/2023, PUC-Rio, Departamento de Informática (2023)
Su, A.I., et al.: A gene atlas of the mouse and human protein-encoding transcriptomes. Proc. Natl. Acad. Sci. 101(16), 6062–6067 (2004)
Suárez-Farinas, M., Li, K., Fuentes-Duculan, J., Hayden, K., Brodmerkel, C., Krueger, J.G.: Expanding the psoriasis disease profile: interrogation of the skin and serum of patients with moderate-to-severe psoriasis. J. Investig. Dermatol. 132(11), 2552–2564 (2012)
Sun, L., et al.: Neuronal and glioma-derived stem cell factor induces angiogenesis within the brain. Cancer Cell 9(4), 287–300 (2006)
Taskesen, E., et al.: Prognostic impact, concurrent genetic mutations, and gene expression features of AML with CEBPA mutations in a cohort of 1182 cytogenetically normal AML patients: further evidence for CEBPA double mutant AML as a distinctive disease entity. Blood J. Am. Soc. Hematol. 117(8), 2469–2475 (2011)
Tian, E., et al.: The role of the WNT-signaling antagonist DKK1 in the development of osteolytic lesions in multiple myeloma. N. Engl. J. Med. 349(26), 2483–2494 (2003)
Votavova, H., et al.: Transcriptome alterations in maternal and fetal cells induced by tobacco smoke. Placenta 32(10), 763–770 (2011)
Zhang, Z.Y., Li, T., Ding, C., Ren, X.W., Zhang, X.S.: Binary matrix factorization for analyzing gene expression data. Data Min. Knowl. Disc. 20, 28–52 (2010)
Zhang, Z., Li, T., Ding, C., Zhang, X.: Binary matrix factorization with applications. In: Seventh IEEE International Conference on Data Mining (ICDM 2007), pp. 391–400. IEEE (2007)
Zitnik, M., Zupan, B.: NIMFA: a python library for nonnegative matrix factorization. J. Mach. Learn. Res. 13, 849–853 (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Spyrides, G., Poggi, M., Lopes, H. (2023). Binary Matrix Factorization Discretization. In: Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing. ICAISC 2023. Lecture Notes in Computer Science(), vol 14126. Springer, Cham. https://doi.org/10.1007/978-3-031-42508-0_35
Download citation
DOI: https://doi.org/10.1007/978-3-031-42508-0_35
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-42507-3
Online ISBN: 978-3-031-42508-0
eBook Packages: Computer ScienceComputer Science (R0)