Abstract
Linear Dimensionality Reduction (LDR) methods has gained much attention in the last decades and has been used in the context of data mining applications to reconstruct a given data matrix. The effectiveness of low rank models in data science is justified by the fact that one can suppose that each row or column in the data matrix is associated to a bounded latent variable, and entries of the matrix are generated by applying a piece-wise analytic function to these latent variables. Formally, LDR can be mathematically formalized as optimization problems at which regularization terms can be often added to enforce particular constraints emphasizing useful properties in data. From this point of view, the tune of the regularization hyperparameters (HPs), controlling the weight of the additional constraints, represents an interesting problem to be solved automatically rather than by a trial and error approach. In this work, we focus on the role the regularization HPs act in Nonnegative Matrix Factorizations (NMF) context and how their right choice can affect further results, proposing a complete overview and new directions for a novel approach. Moreover, a novel bilevel formulation of the regularization HP selection is proposed which incorporates the HP choice directly in the unsupervised algorithm as a part of the updating process.
N. Del Buono, F. Esposito and L. Selicato—All authors equally contribute to this work.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
In bilevel programming an outer optimization problem is solved subject to the optimality of an inner optimization problem.
- 2.
\(\varLambda =\varLambda _1\times \dots \times \varLambda _d\) is the HP domain, where each set \(\varLambda _i\) can be real-valued (e.g., learning rate, regularization coefficient), integer-valued (e.g., number of layers), binary (e.g., whether to use early stopping or not), categorical (e.g., choice of optimizer).
- 3.
\(\omega \) can be scalar, vector or matrix. \(\varOmega =\varOmega _1 \times \dots \times \varOmega _n\) is the parameter domain, where each set \(\varOmega _j\) can be real-valued or integer-valued (e.g., weights of regression and classification, factors in matrix decompositions).
- 4.
Note that the term “orthogonal” is to be understood as “soft-orthogonal” indicating the orthogonality property of the columns or rows of the matrices W or H, respectively. With this clarification, the soft-orthogonal NMF problem can be defined as \(\min \limits _{W\ge 0,H\ge 0}{D_\beta (X,WH)}\qquad \text {s.t.}\quad W^\top W = I_r\quad \text {and/or}\quad HH^\top =I_r\).
- 5.
\(\ell _0\) norm is not truly a norm since the property of positive homogeneity is not respected. Nevertheless, since it can be expressed in terms of the \(\ell _p\) norm \(\left\| x\right\| _0 = \lim \limits _{p\rightarrow 0}{\left\| x\right\| _p^p}\), in literature, it is referred to as a “norm”.
- 6.
The Hoyer sparsity measure is computed as the normalized ratio of \(\ell _1\) and \(\ell _2\) norm.
- 7.
For particular values of \(\beta \) and specific regularization functions.
References
Argyriou, A., Evgeniou, T., Pontil, M.: Multi-task feature learning. Adv. Neural Inf. Process. Syst. 19, 41–48 (2007)
Berry, M.W., Browne, M., Langville, A.N., Pauca, V.P., Plemmons, R.J.: Algorithms and applications for approximate nonnegative matrix factorization. Comput. Stat. Data Anal. 52(1), 155–173 (2007)
Del Buono, N., Esposito, F., Selicato, L.: Methods for hyperparameters optimization in learning approaches: an overview. In: Nicosia, G., et al. (eds.) LOD 2020. LNCS, vol. 12565, pp. 100–112. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-64583-0_11
Esposito, F.: A review on initialization methods for nonnegative matrix factorization: towards omics data experiments. Mathematics 9(9), 1006 (2021)
Esposito, F., Del Buono, N., Selicato, L.: Nonnegative matrix factorization models for knowledge extraction from biomedical and other real world data. PAMM 20(1), e202000032 (2021)
Esposito, F., Gillis, N., Del Buono, N.: Orthogonal joint sparse NMF for microarray data analysis. J. Math. Biol. 79(1), 223–247 (2019)
Franceschi, L., Donini, M., Frasconi, P., Pontil, M.: Forward and reverse gradient-based hyperparameter optimization. In: International Conference on Machine Learning, pp. 1165–1173. PMLR (2017)
Gao, T., Guo, Y., Deng, C., Wang, S., Yu, Q.: Hyperspectral unmixing based on constrained nonnegative matrix factorization via approximate L0. In: Proceedings of IEEE International Geoscience Remote Sensing Symposium, pp. 2156–2159 (2015)
Gillis, N.: The why and how of nonnegative matrix factorization. In: Suykens, J., Signoretto, M., Argyriou, A. (eds.) Regularization, Optimization, Kernels, and Support Vector Machines. Machine Learning and Pattern Recognition Series, pp. 257–291. Chapman & Hall/CRC, Boca Raton (2014)
Gillis, N.: Nonnegative Matrix Factorization. SIAM, Philadelphia (2020)
Hanke, M.: A Taste of Inverse Problems: Basic Theory and Examples. SIAM, Philadelphia (2017)
Hoyer, P.O.: Non-negative sparse coding. In: Proceedings of the 2002 12th IEEE Workshop on Neural Networks for Signal Processing, 2002, pp. 557–565. IEEE (2002)
Hyunsoo, K., Haesun, P.: Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis. Bioinformatics 23(12), 1495–1502 (2007)
Kompass, R.: A generalized divergence measure for nonnegative matrix factorization. Neural Comput. 19(3), 780–791 (2007)
Leplat, V., Gillis, N., Févotte, C.: Multi-resolution beta-divergence NMF for blind spectral unmixing. arXiv preprint arXiv:2007.03893 (2020)
Li, Z., Tang, Z., Ding, S.: Dictionary learning by nonnegative matrix factorization with 1/2-norm sparsity constraint. In: 2013 IEEE International Conference on Cybernetics (CYBCONF), pp. 63–67. IEEE (2013)
Liu, J.-X., Wang, D., Gao, Y.-L., Zheng, C.-H., Xu, Y., Yu, J.: Regularized non-negative matrix factorization for identifying differentially expressed genes and clustering samples: a survey. IEEE/ACM Trans. Comput. Biol. Bioinf. 15(3), 974–987 (2017)
Lucy, L.B.: An iterative technique for the rectification of observed distributions. Astron. J. 79, 745 (1974)
Oraintara, S., Karl, W.C., Castanon, D.A., Nguyen, T.Q.: A method for choosing the regularization parameter in generalized Tikhonov regularized linear inverse problems. In: Proceedings 2000 International Conference on Image Processing (Cat. No. 00CH37101), vol. 1, pp. 93–96. IEEE (2000)
Pedregosa, F.: Hyperparameter optimization with approximate gradient. In: International Conference on Machine Learning, pp. 737–746. PMLR (2016)
R-Team, R.C.: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2015)
Richardson, W.H.: Bayesian-based iterative method of image restoration. JoSA 62(1), 55–59 (1972)
Selicato, L.: A new ensemble method for detecting anomalies in gene expression matrices. Mathematics 9(8), 882 (2021)
Shaban, A., Cheng, C.-A., Hatch, N., Boots, B.: Truncated back-propagation for bilevel optimization. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 1723–1732. PMLR (2019)
Taslaman, L., Nilsson, B.: A framework for regularized non-negative matrix factorization, with application to the analysis of gene expression data. PloS one 7(11), e46331 (2012)
Zdunek, R.: Regularized NNLS algorithms for nonnegative matrix factorization with application to text document clustering. In: Computer Recognition Systems, vol. 4, pp. 757–766. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20320-6_77
Zdunek, R.: Regularized nonnegative matrix factorization: geometrical interpretation and application to spectral unmixing. Int. J. Appl. Math. Comput. Sci. 24(2), 233–247 (2014)
Zdunek, R., Cichocki, A.: Nonnegative matrix factorization with constrained second-order optimization. Signal Process. 87(8), 1904–1916 (2007)
Zhang, Z., Xu, Y., Yang, J., Li, X., Zhang, D.: A survey of sparse representation: algorithms and applications. IEEE Access 3, 490–530 (2015)
Zheng, C.-H., Huang, D.-S., Zhang, L., Kong, X.-Z.: Tumor clustering using nonnegative matrix factorization with gene selection. IEEE Trans. Inf. Technol. Biomed. 13(4), 599–607 (2009)
Acknowledgments
This work was supported in part by the GNCS-INDAM (Gruppo Nazionale per il Calcolo Scientifico of Istituto Nazionale di Alta Matematica) Francesco Severi, P.le Aldo Moro, Roma, Italy. The author F.E. was funded by REFIN Project, grant number 363BB1F4, Reference project idea UNIBA027 “Un modello numerico-matematico basato su metodologie di algebra lineare e multilineare per l’analisi di dati genomici".
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Del Buono, N., Esposito, F., Selicato, L. (2022). Toward a New Approach for Tuning Regularization Hyperparameter in NMF. In: Nicosia, G., et al. Machine Learning, Optimization, and Data Science. LOD 2021. Lecture Notes in Computer Science(), vol 13163. Springer, Cham. https://doi.org/10.1007/978-3-030-95467-3_36
Download citation
DOI: https://doi.org/10.1007/978-3-030-95467-3_36
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-95466-6
Online ISBN: 978-3-030-95467-3
eBook Packages: Computer ScienceComputer Science (R0)