Skip to main content

Toward a New Approach for Tuning Regularization Hyperparameter in NMF

  • Conference paper
  • First Online:
  • 1419 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 13163))

Abstract

Linear Dimensionality Reduction (LDR) methods has gained much attention in the last decades and has been used in the context of data mining applications to reconstruct a given data matrix. The effectiveness of low rank models in data science is justified by the fact that one can suppose that each row or column in the data matrix is associated to a bounded latent variable, and entries of the matrix are generated by applying a piece-wise analytic function to these latent variables. Formally, LDR can be mathematically formalized as optimization problems at which regularization terms can be often added to enforce particular constraints emphasizing useful properties in data. From this point of view, the tune of the regularization hyperparameters (HPs), controlling the weight of the additional constraints, represents an interesting problem to be solved automatically rather than by a trial and error approach. In this work, we focus on the role the regularization HPs act in Nonnegative Matrix Factorizations (NMF) context and how their right choice can affect further results, proposing a complete overview and new directions for a novel approach. Moreover, a novel bilevel formulation of the regularization HP selection is proposed which incorporates the HP choice directly in the unsupervised algorithm as a part of the updating process.

N. Del Buono, F. Esposito and L. Selicato—All authors equally contribute to this work.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    In bilevel programming an outer optimization problem is solved subject to the optimality of an inner optimization problem.

  2. 2.

    \(\varLambda =\varLambda _1\times \dots \times \varLambda _d\) is the HP domain, where each set \(\varLambda _i\) can be real-valued (e.g., learning rate, regularization coefficient), integer-valued (e.g., number of layers), binary (e.g., whether to use early stopping or not), categorical (e.g., choice of optimizer).

  3. 3.

    \(\omega \) can be scalar, vector or matrix. \(\varOmega =\varOmega _1 \times \dots \times \varOmega _n\) is the parameter domain, where each set \(\varOmega _j\) can be real-valued or integer-valued (e.g., weights of regression and classification, factors in matrix decompositions).

  4. 4.

    Note that the term “orthogonal” is to be understood as “soft-orthogonal” indicating the orthogonality property of the columns or rows of the matrices W or H, respectively. With this clarification, the soft-orthogonal NMF problem can be defined as \(\min \limits _{W\ge 0,H\ge 0}{D_\beta (X,WH)}\qquad \text {s.t.}\quad W^\top W = I_r\quad \text {and/or}\quad HH^\top =I_r\).

  5. 5.

    \(\ell _0\) norm is not truly a norm since the property of positive homogeneity is not respected. Nevertheless, since it can be expressed in terms of the \(\ell _p\) norm \(\left\| x\right\| _0 = \lim \limits _{p\rightarrow 0}{\left\| x\right\| _p^p}\), in literature, it is referred to as a “norm”.

  6. 6.

    The Hoyer sparsity measure is computed as the normalized ratio of \(\ell _1\) and \(\ell _2\) norm.

  7. 7.

    For particular values of \(\beta \) and specific regularization functions.

References

  1. Argyriou, A., Evgeniou, T., Pontil, M.: Multi-task feature learning. Adv. Neural Inf. Process. Syst. 19, 41–48 (2007)

    MATH  Google Scholar 

  2. Berry, M.W., Browne, M., Langville, A.N., Pauca, V.P., Plemmons, R.J.: Algorithms and applications for approximate nonnegative matrix factorization. Comput. Stat. Data Anal. 52(1), 155–173 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  3. Del Buono, N., Esposito, F., Selicato, L.: Methods for hyperparameters optimization in learning approaches: an overview. In: Nicosia, G., et al. (eds.) LOD 2020. LNCS, vol. 12565, pp. 100–112. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-64583-0_11

    Chapter  Google Scholar 

  4. Esposito, F.: A review on initialization methods for nonnegative matrix factorization: towards omics data experiments. Mathematics 9(9), 1006 (2021)

    Article  Google Scholar 

  5. Esposito, F., Del Buono, N., Selicato, L.: Nonnegative matrix factorization models for knowledge extraction from biomedical and other real world data. PAMM 20(1), e202000032 (2021)

    Article  Google Scholar 

  6. Esposito, F., Gillis, N., Del Buono, N.: Orthogonal joint sparse NMF for microarray data analysis. J. Math. Biol. 79(1), 223–247 (2019)

    Article  MathSciNet  MATH  Google Scholar 

  7. Franceschi, L., Donini, M., Frasconi, P., Pontil, M.: Forward and reverse gradient-based hyperparameter optimization. In: International Conference on Machine Learning, pp. 1165–1173. PMLR (2017)

    Google Scholar 

  8. Gao, T., Guo, Y., Deng, C., Wang, S., Yu, Q.: Hyperspectral unmixing based on constrained nonnegative matrix factorization via approximate L0. In: Proceedings of IEEE International Geoscience Remote Sensing Symposium, pp. 2156–2159 (2015)

    Google Scholar 

  9. Gillis, N.: The why and how of nonnegative matrix factorization. In: Suykens, J., Signoretto, M., Argyriou, A. (eds.) Regularization, Optimization, Kernels, and Support Vector Machines. Machine Learning and Pattern Recognition Series, pp. 257–291. Chapman & Hall/CRC, Boca Raton (2014)

    Google Scholar 

  10. Gillis, N.: Nonnegative Matrix Factorization. SIAM, Philadelphia (2020)

    Book  MATH  Google Scholar 

  11. Hanke, M.: A Taste of Inverse Problems: Basic Theory and Examples. SIAM, Philadelphia (2017)

    Book  MATH  Google Scholar 

  12. Hoyer, P.O.: Non-negative sparse coding. In: Proceedings of the 2002 12th IEEE Workshop on Neural Networks for Signal Processing, 2002, pp. 557–565. IEEE (2002)

    Google Scholar 

  13. Hyunsoo, K., Haesun, P.: Sparse non-negative matrix factorizations via alternating non-negativity-constrained least squares for microarray data analysis. Bioinformatics 23(12), 1495–1502 (2007)

    Article  Google Scholar 

  14. Kompass, R.: A generalized divergence measure for nonnegative matrix factorization. Neural Comput. 19(3), 780–791 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  15. Leplat, V., Gillis, N., Févotte, C.: Multi-resolution beta-divergence NMF for blind spectral unmixing. arXiv preprint arXiv:2007.03893 (2020)

  16. Li, Z., Tang, Z., Ding, S.: Dictionary learning by nonnegative matrix factorization with 1/2-norm sparsity constraint. In: 2013 IEEE International Conference on Cybernetics (CYBCONF), pp. 63–67. IEEE (2013)

    Google Scholar 

  17. Liu, J.-X., Wang, D., Gao, Y.-L., Zheng, C.-H., Xu, Y., Yu, J.: Regularized non-negative matrix factorization for identifying differentially expressed genes and clustering samples: a survey. IEEE/ACM Trans. Comput. Biol. Bioinf. 15(3), 974–987 (2017)

    Article  Google Scholar 

  18. Lucy, L.B.: An iterative technique for the rectification of observed distributions. Astron. J. 79, 745 (1974)

    Article  Google Scholar 

  19. Oraintara, S., Karl, W.C., Castanon, D.A., Nguyen, T.Q.: A method for choosing the regularization parameter in generalized Tikhonov regularized linear inverse problems. In: Proceedings 2000 International Conference on Image Processing (Cat. No. 00CH37101), vol. 1, pp. 93–96. IEEE (2000)

    Google Scholar 

  20. Pedregosa, F.: Hyperparameter optimization with approximate gradient. In: International Conference on Machine Learning, pp. 737–746. PMLR (2016)

    Google Scholar 

  21. R-Team, R.C.: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2015)

    Google Scholar 

  22. Richardson, W.H.: Bayesian-based iterative method of image restoration. JoSA 62(1), 55–59 (1972)

    Article  Google Scholar 

  23. Selicato, L.: A new ensemble method for detecting anomalies in gene expression matrices. Mathematics 9(8), 882 (2021)

    Article  Google Scholar 

  24. Shaban, A., Cheng, C.-A., Hatch, N., Boots, B.: Truncated back-propagation for bilevel optimization. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 1723–1732. PMLR (2019)

    Google Scholar 

  25. Taslaman, L., Nilsson, B.: A framework for regularized non-negative matrix factorization, with application to the analysis of gene expression data. PloS one 7(11), e46331 (2012)

    Article  Google Scholar 

  26. Zdunek, R.: Regularized NNLS algorithms for nonnegative matrix factorization with application to text document clustering. In: Computer Recognition Systems, vol. 4, pp. 757–766. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-20320-6_77

  27. Zdunek, R.: Regularized nonnegative matrix factorization: geometrical interpretation and application to spectral unmixing. Int. J. Appl. Math. Comput. Sci. 24(2), 233–247 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  28. Zdunek, R., Cichocki, A.: Nonnegative matrix factorization with constrained second-order optimization. Signal Process. 87(8), 1904–1916 (2007)

    Article  MATH  Google Scholar 

  29. Zhang, Z., Xu, Y., Yang, J., Li, X., Zhang, D.: A survey of sparse representation: algorithms and applications. IEEE Access 3, 490–530 (2015)

    Article  Google Scholar 

  30. Zheng, C.-H., Huang, D.-S., Zhang, L., Kong, X.-Z.: Tumor clustering using nonnegative matrix factorization with gene selection. IEEE Trans. Inf. Technol. Biomed. 13(4), 599–607 (2009)

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported in part by the GNCS-INDAM (Gruppo Nazionale per il Calcolo Scientifico of Istituto Nazionale di Alta Matematica) Francesco Severi, P.le Aldo Moro, Roma, Italy. The author F.E. was funded by REFIN Project, grant number 363BB1F4, Reference project idea UNIBA027 “Un modello numerico-matematico basato su metodologie di algebra lineare e multilineare per l’analisi di dati genomici".

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Laura Selicato .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Del Buono, N., Esposito, F., Selicato, L. (2022). Toward a New Approach for Tuning Regularization Hyperparameter in NMF. In: Nicosia, G., et al. Machine Learning, Optimization, and Data Science. LOD 2021. Lecture Notes in Computer Science(), vol 13163. Springer, Cham. https://doi.org/10.1007/978-3-030-95467-3_36

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-95467-3_36

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-95466-6

  • Online ISBN: 978-3-030-95467-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics