Abstract
The classification of cancer using established biological knowledge has become increasingly prevalent, primarily due to the improved accuracy and enhanced biological interpretability this method offers for classification outcomes. Despite these advances, current cancer classification methods encounter challenges in maintaining the intricate structure of gene networks and leveraging the statistical information embedded within gene data. In this paper, we introduce an adaptive hypergraph regularized logistic regression model that capitalizes on established biological knowledge and statistical information within gene data. Specifically, our model integrates a hypergraph into the objective function, an innovation that preserves the complex gene network structure more effectively. Additionally, we implement adaptive penalties in the penalty term, which facilitates the targeted selection of disease-related genes based on gene weights. To further refine our model, we incorporate constraints on gene pairs with high statistical correlations within the penalty term, thereby minimizing the inclusion of redundant genes. We adopt the block coordinate descent algorithm to address the nonconvexity of our model. Through comparative experimentation with established methodologies on real datasets, our proposed model demonstrates marked improvement in classification accuracy and adept selection of genes pertinent to specific diseases.
Similar content being viewed by others
Data availability and access
The data and code underlying this study have been uploaded to github and are accessible at the following link: https://github.com/AdaH-LR/AdaH.LR.
References
Gurunlu B, Ozturk S (2022) A novel method for forgery detection on lung cancer images. Int J Inf Secur Sci 11(3):13–20
Brumback B, Srinath M (1987) A chi-square test for fault-detection in kalman filters. IEEE Trans Auto Control 32(6):552–4. https://doi.org/10.1109/TAC.1987.1104658
Urbanowicz RJ, Meeker M, Cava WL, Olson RS, Moore JH (2018) Relief-based feature selection: introduction and review. J Biomed Informat 85:189–203. https://doi.org/10.1016/j.jbi.2018.07.014
Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46:389–422. https://doi.org/10.1023/A:1012487302797
Algamal ZY, Lee MH (2015) Regularized logistic regression with adjusted adaptive elastic net for gene selection in high dimensional cancer classification. Comput Biol Med 67:136–145. https://doi.org/10.1016/j.compbiomed.2015.10.008
Park H, Shiraishi Y, Imoto S, Miyano S (2016) A novel adaptive penalized logistic regression for uncovering biomarker associated with anti-cancer drug sensitivity. IEEE/ACM Trans Comput Biol Bioinformat 14(4):771–782. https://doi.org/10.1109/TCBB.2016.2561937
Liu C, Wong HS (2017) Structured penalized logistic regression for gene selection in gene expression data analysis. IEEE/ACM Trans Comput Biol Bioinformat 16(1):312–321. https://doi.org/10.1109/TCBB.2017.2767589
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J Royal Stat Soc Ser B Methodol 58(1):267–288. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x
Wang R, Xiu N-H, Zhang C (2019) Greedy projected gradient-newton method for sparse logistic regression. IEEE Trans Neural Netw Learn Syst 31(2):527–538. https://doi.org/10.1109/TNNLS.2019.2905261
Song X-K, Liang K, Li J-T (2022) Wrlr: a weighted group regularized logistic regression for cancer diagnosis and gene selection. IEEE/ACM Trans Comput Biol Bioinformat 20(2):1563–1573. https://doi.org/10.1109/TCBB.2022.3203167
Yang S-J, Chen S-J, Wang P (2023) Tsplasso: a two-stage prior lasso algorithm for gene selection using omics data. IEEE J Biomed Health Informat. https://doi.org/10.1109/JBHI.2023.3326485
Seffernick AE, Mrózek K, Nicolet D, Stone RM (2022) High-dimensional genomic feature selection with the ordered stereotype logit model. Brief Bioinformat 23(6):bbac414. https://doi.org/10.1093/bib/bbac414
Perscheid C (2021) Integrative biomarker detection on high-dimensional gene expression datasets: a survey on prior knowledge approaches. Brief Bioinformat 22(3):bbaa151. https://doi.org/10.1093/bib/bbaa151
Li C-Y, Li H-Z (2008) Network-constrained regularization and variable selection for analysis of genomic data. Bioinformat 24(9):1175–1182. https://doi.org/10.1093/bioinformatics/btn081
Min W-W, Liu J, Zhang S-H (2016) Network-regularized sparse logistic regression models for clinical risk prediction and biomarker discovery. IEEE/ACM Trans Comput Biol Bioinformat 15(3):944–953. https://doi.org/10.1109/TCBB.2016.2640303
Wang W, Liu W (2020) Integration of gene interaction information into a reweighted lasso-cox model for accurate survival prediction. Bioinformat 36(22–23):5405–5414. https://doi.org/10.1093/bioinformatics/btaa1046
Scholkopf B, Platt J, Hofmann T (2007) Learning with hypergraphs: clustering, classification, and embedding. Advances in Neural Information Processing Systems 19: Proceedings of the 2006
Yang X-H, Che H-J, Liu C (2023) Adaptive graph nonnegative matrix factorization with the self-paced regularization. Appl Intell 53:15818–15835. https://doi.org/10.1007/s10489-022-04339-w
Xu X-Y, Wu X, Wei F-L, Zhong W, Nie F-P (2021) A general framework for feature selection under orthogonal regression with global redundancy minimization. IEEE Trans Knowl Data Eng 34(11):5056–5069. https://doi.org/10.1109/TKDE.2021.3059523
Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101(476):1418–1429. https://doi.org/10.2307/27639762
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J Royal Stat Soc Ser B Stat Methodol 67(2):301–320. https://doi.org/10.1111/j.1467-9868.2005.00503.x
Tibshirani R, Saunders M, Rosset S, Zhu J, Knight K (2005) Sparsity and smoothness via the fused lasso. J Royal Stat Soc Ser B Stat Methodol 67(1):91–108. https://doi.org/10.1111/j.1467-9868.2005.00490.x
Xie L-H, He B, Varathan P, Nho K, Risacher SL, Saykin AJ, Yan J-W (2021) Integrative-omics for discovery of network-level disease biomarkers: a case study in alzheimer’s disease. Brief Bioinformat 22(6):bbab121. https://doi.org/10.1093/bib/bbab121
Peake RW (2013) Significance for the sake of significance: the relevance of statistical data. Clin Chem 59(6):1002. https://doi.org/10.1373/clinchem.2013.205757
Sedgwick P (2012) Pearson’s correlation coefficient. BMJ (online) 345(jul04 1):e4483–e4483. https://doi.org/10.1136/bmj.e4483
Yamaguchi F (2002) Geometric newton-raphson method. Comput Aided Geom Des 299–324. https://doi.org/10.1007/978-4-431-67881-6_15
Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33(1):1. https://doi.org/10.1163/ej.9789004178922.i-328.7
Tseng P (2001) Convergence of a block coordinate descent method for nondifferentiable minimization. J Optim Theory Appl 109(3):475–494. https://doi.org/10.1023/A:1017501703105
Kanehisa M, Furumichi M, Tanabe M, Sato Y, Morishima K (2017) KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res 45(D1):D353–D361. https://doi.org/10.1093/nar/gkw1092
Xie C, Mao X-Z, Huang J-J, Ding Y, Wu J-M, Dong S, Wei L-P (2011) KOBAS 2.0: a web server for annotation and identification of enriched pathways and diseases. Nucleic Acids Res 39(2): W316–W322. https://doi.org/10.1093/nar/gkr483
Coller HA (2014) Is cancer a metabolic disease? Am J Pathol 184(1):4–17. https://doi.org/10.1016/j.ajpath.2013.07.035
Zou Y-F, Xie C-W, Yang S-X, Xiong J-P (2017) AMPK activators suppress breast cancer cell growth by inhibiting dvl3-facilitated wnt/\(\beta \)-catenin signaling pathway activity. Mol Med Rep 15(2):899–907. https://doi.org/10.3892/mmr.2016.6094
Dong H-L, Claffey KP, Brocke S, Epstein PM (2015) Inhibition of breast cancer cell migration by activation of cAMP signaling. Breast Cancer Res Treat 152(1):17–28. https://doi.org/10.1007/s10549-015-3445-9
Chen Y-Z, Xue J-Y, Chen C-M, Yang B-L, Xu Q-H, Wu F, Wu J (2012) PPAR signaling pathway may be an important predictor of breast cancer response to neoadjuvant chemotherapy. Cancer Chemother Pharmacol 70(5):637–644. https://doi.org/10.1007/s00280-012-1949-0
Khodabandehlou N, Mostafaei S, Etemadi A, Ghasemi A, Payandeh M, Hadifar S, Moghoofei M (2019) Human papilloma virus and breast cancer: the role of inflammation and viral expressed proteins. BMC Cancer 19(1):1–11. https://doi.org/10.1186/s12885-019-5286-0
Wu M, Tong X, Wang D-G, Wang L, Fan H (2020) Soluble intercellular cell adhesion molecule-1 in lung cancer: a meta-analysis. Pathol Res Pract 216(10):153029. https://doi.org/10.1016/j.prp.2020.153029
Parker AL, Cox TR (2020) The role of the ecm in lung cancer dormancy and outgrowth. Front Oncol 10(1766). https://doi.org/10.3389/fonc.2020.01766
Cheng H-Y, Shcherba M, Pendurti G, Liang Y-X, Piperdi B, Perez-Soler R (2014) Targeting the pi3k/akt/mtor pathway: potential for lung cancer treatment. Lung Cancer Manage 3(1):67–75. https://doi.org/10.2217/lmt.13.72
Acknowledgements
The authors would like to thank all the anonymous reviewers for their constructive advice.
Author information
Authors and Affiliations
Contributions
Conceptualization, Yong Jin and Huaibin Hou; methodology, Yong Jin and Huaibin Hou; validation, Huaibin Hou; formal analysis, Mian Qin and Zhen Zhang; investigation, Mian Qin and Wei Yang; data curation,Huaibin Hou and Wei Yang; writing—original draft preparation, Huaibin Hou; writing—review and editing, Yong Jin and Huaibin Hou; funding acquisition, Yong Jin. All the authors have read and agreed to the published version of the manuscript.
Corresponding author
Ethics declarations
Ethical and informed consent for the data used
This article does not involve any studies with human participants or animals performed by any of the authors.
Competing interests
No conflicts of interest exit in the submission of this manuscript, and the manuscript has been approved by all the authors for publication.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Jin, Y., Hou, H., Qin, M. et al. Adaptive hypergraph regularized logistic regression model for bioinformatic selection and classification. Appl Intell 54, 2349–2360 (2024). https://doi.org/10.1007/s10489-024-05304-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-024-05304-5