Skip to main content

Binary Matrix Factorization Discretization

  • Conference paper
  • First Online:
Artificial Intelligence and Soft Computing (ICAISC 2023)

Abstract

Binary Matrix Factorization can be used at the core of many data analysis pipelines. It is used for clustering items, categorical characteristics of observations, and recommendation systems for users interacting with itemsets. The most common algorithms approximate the factorization through gradient descent. However, the results are approximately binary. When thresholded, the reconstruction error is so high that the matrices are no longer representative of the original. Therefore, the analyst must always choose between precision and explainability. We achieved theoretical results that greatly improve solving the exact subproblem of this factorization. These results enable a backtracking approach that can solve the linearized formulation of the subproblem in large binary matrices taking advantage of their sparsity in real settings. Finally, we test this new approach post-processing matrices yielded by gradient descent algorithms using the new backtracking to obtain actually binary factorized matrices with a diminished reconstruction error, close the level of what gradient descent is capable of finding. We tested our algorithm using gene expression datasets, and could find a error rate comparable to the relaxed continuous problem before discretization. The discretized matrices allow for domain experts to question biclusters of gene-expressions and samples taken.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Barrett, T., et al.: NCBI GEO: archive for functional genomics data sets-update. Nucleic Acids Res. 41(D1), D991–D995 (2012)

    Article  Google Scholar 

  2. Chandran, U.R., et al.: Gene expression profiles of prostate cancer reveal involvement of multiple molecular pathways in the metastatic process. BMC Cancer 7(1), 1–21 (2007)

    Article  MathSciNet  Google Scholar 

  3. Creighton, C., Hanash, S.: Mining gene expression databases for association rules. Bioinformatics 19(1), 79–86 (2003)

    Article  Google Scholar 

  4. Dillman, J.F., et al.: Genomic analysis of rodent pulmonary tissue following Bis-(2-chloroethyl) sulfide exposure. Chem. Res. Toxicol. 18(1), 28–34 (2005)

    Article  MathSciNet  Google Scholar 

  5. Edgar, R., Domrachev, M., Lash, A.E.: Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 30(1), 207–210 (2002)

    Article  Google Scholar 

  6. Hinze, C.H., et al.: Immature cell populations and an erythropoiesis gene-expression signature in systemic juvenile idiopathic arthritis: implications for pathogenesis. Arthritis Res. Ther. 12, 1–13 (2010)

    Article  Google Scholar 

  7. Iwakawa, M., et al.: The radiation-induced cell-death signaling pathway is activated by concurrent use of cisplatin in sequential biopsy specimens from patients with cervical cancer. Cancer Biol. Ther. 6(6), 905–911 (2007)

    Article  Google Scholar 

  8. Kumar, R., Panigrahy, R., Rahimi, A., Woodruff, D.: Faster algorithms for binary matrix factorization. In: International Conference on Machine Learning, pp. 3551–3559. PMLR (2019)

    Google Scholar 

  9. LaBreche, H.G., Nevins, J.R., Huang, E.: Integrating factor analysis and a transgenic mouse model to reveal a peripheral blood predictor of breast tumors. BMC Med. Genomics 4(1), 1–14 (2011)

    Article  Google Scholar 

  10. Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999)

    Article  MATH  Google Scholar 

  11. Liu, Y.C., Cheng, C.P., Tseng, V.S.: Mining differential top-k co-expression patterns from time course comparative gene expression datasets. BMC Bioinformatics 14, 1–13 (2013)

    Article  Google Scholar 

  12. Metzeler, K.H., et al.: An 86-probe-set gene-expression signature predicts survival in cytogenetically normal acute myeloid leukemia. Blood J. Am. Soc. Hematol. 112(10), 4193–4201 (2008)

    Google Scholar 

  13. Meyer, L.H., et al.: Early relapse in all is identified by time to leukemia in NOD/SCID mice and is characterized by a gene signature involving survival pathways. Cancer Cell 19(2), 206–217 (2011)

    Article  Google Scholar 

  14. Miettinen, P., Mielikäinen, T., Gionis, A., Das, G., Mannila, H.: The discrete basis problem. IEEE Trans. Knowl. Data Eng. 20(10), 1348–1362 (2008)

    Article  Google Scholar 

  15. Mirisaee, H., Gaussier, E., Termier, A.: Efficient local search for L1 and L2 binary matrix factorization. Intell. Data Anal. 783–807 (2016)

    Google Scholar 

  16. Monks, S., et al.: Genetic inheritance of gene expression in human cell lines. Am. J. Hum. Genet. 75(6), 1094–1105 (2004)

    Article  Google Scholar 

  17. Morse, C.G., et al.: HIV infection and antiretroviral therapy have divergent effects on mitochondria in adipose tissue. J. Infect. Dis. 205(12), 1778–1787 (2012)

    Article  Google Scholar 

  18. Nair, R.P., et al.: Genome-wide scan reveals association of psoriasis with IL-23 and NF-\(\kappa \)B pathways. Nat. Genet. 41(2), 199–204 (2009)

    Article  Google Scholar 

  19. Noble, C.L., et al.: Regional variation in gene expression in the healthy colon is dysregulated in ulcerative colitis. Gut 57(10), 1398–1405 (2008)

    Article  Google Scholar 

  20. Parnell, G.P., et al.: Identifying key regulatory genes in the whole blood of septic patients to monitor underlying immune dysfunctions. Shock 40(3), 166–174 (2013)

    Article  Google Scholar 

  21. Pellagatti, A., et al.: Deregulated gene expression pathways in myelodysplastic syndrome hematopoietic stem cells. Leukemia 24(4), 756–764 (2010)

    Article  Google Scholar 

  22. Pfister, T.D., et al.: Topoisomerase i levels in the NCI-60 cancer cell line panel determined by validated ELISA and microarray analysis and correlation with indenoisoquinoline sensitivity. Mol. Cancer Ther. 8(7), 1878–1884 (2009)

    Article  Google Scholar 

  23. Prat, A., et al.: Research-based PAM50 subtype predictor identifies higher responses and improved survival outcomes in HER2-positive breast cancer in the NOAH study. Clin. Cancer Res. 20(2), 511–521 (2014)

    Article  Google Scholar 

  24. Schena, M., Shalon, D., Davis, R.W., Brown, P.O.: Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270(5235), 467–470 (1995)

    Article  Google Scholar 

  25. Schirmer, S.H., et al.: Suppression of inflammatory signaling in monocytes from patients with coronary artery disease. J. Mol. Cell. Cardiol. 46(2), 177–185 (2009)

    Article  Google Scholar 

  26. Shapira, S.D., et al.: A physical and regulatory map of host-influenza interactions reveals pathways in H1N1 infection. Cell 139(7), 1255–1267 (2009)

    Article  Google Scholar 

  27. SnÃÅel, V., PlatoÅ, J., KrÃmer, P.: Developing genetic algorithms for boolean matrix factorization. In: CEUR Workshop Proceedings, pp. 61–70. CEUR-WS (2008)

    Google Scholar 

  28. SnÃÅel, V., PlatoÅ, J., KrÃmer, P., HÃsek, D., Frolov, A.: On the road to genetic boolean matrix factorization. Neural Netw. World 17, 675–688 (2007)

    Google Scholar 

  29. Spira, A., et al.: Airway epithelial gene expression in the diagnostic evaluation of smokers with suspect lung cancer. Nat. Med. 13(3), 361–366 (2007)

    Article  Google Scholar 

  30. Spyrides, G., Poggi, M., Lopes, H.: Towards efficient searches for the discrete basis problem. Technical report, 02/2023, PUC-Rio, Departamento de Informática (2023)

    Google Scholar 

  31. Su, A.I., et al.: A gene atlas of the mouse and human protein-encoding transcriptomes. Proc. Natl. Acad. Sci. 101(16), 6062–6067 (2004)

    Article  Google Scholar 

  32. Suárez-Farinas, M., Li, K., Fuentes-Duculan, J., Hayden, K., Brodmerkel, C., Krueger, J.G.: Expanding the psoriasis disease profile: interrogation of the skin and serum of patients with moderate-to-severe psoriasis. J. Investig. Dermatol. 132(11), 2552–2564 (2012)

    Article  Google Scholar 

  33. Sun, L., et al.: Neuronal and glioma-derived stem cell factor induces angiogenesis within the brain. Cancer Cell 9(4), 287–300 (2006)

    Article  Google Scholar 

  34. Taskesen, E., et al.: Prognostic impact, concurrent genetic mutations, and gene expression features of AML with CEBPA mutations in a cohort of 1182 cytogenetically normal AML patients: further evidence for CEBPA double mutant AML as a distinctive disease entity. Blood J. Am. Soc. Hematol. 117(8), 2469–2475 (2011)

    Google Scholar 

  35. Tian, E., et al.: The role of the WNT-signaling antagonist DKK1 in the development of osteolytic lesions in multiple myeloma. N. Engl. J. Med. 349(26), 2483–2494 (2003)

    Article  Google Scholar 

  36. Votavova, H., et al.: Transcriptome alterations in maternal and fetal cells induced by tobacco smoke. Placenta 32(10), 763–770 (2011)

    Article  Google Scholar 

  37. Zhang, Z.Y., Li, T., Ding, C., Ren, X.W., Zhang, X.S.: Binary matrix factorization for analyzing gene expression data. Data Min. Knowl. Disc. 20, 28–52 (2010)

    Article  MathSciNet  Google Scholar 

  38. Zhang, Z., Li, T., Ding, C., Zhang, X.: Binary matrix factorization with applications. In: Seventh IEEE International Conference on Data Mining (ICDM 2007), pp. 391–400. IEEE (2007)

    Google Scholar 

  39. Zitnik, M., Zupan, B.: NIMFA: a python library for nonnegative matrix factorization. J. Mach. Learn. Res. 13, 849–853 (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Georges Spyrides .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Spyrides, G., Poggi, M., Lopes, H. (2023). Binary Matrix Factorization Discretization. In: Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing. ICAISC 2023. Lecture Notes in Computer Science(), vol 14126. Springer, Cham. https://doi.org/10.1007/978-3-031-42508-0_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-42508-0_35

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-42507-3

  • Online ISBN: 978-3-031-42508-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics