Unsupervised nested Dirichlet finite mixture model for clustering

Alkhawaja, Fares; Bouguila, Nizar

doi:10.1007/s10489-023-04888-8

Unsupervised nested Dirichlet finite mixture model for clustering

Published: 07 August 2023

Volume 53, pages 25232–25258, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

178 Accesses
1 Citation
Explore all metrics

Abstract

The Dirichlet distribution is widely used in the context of mixture models. Despite its flexibility, it still suffers from some limitations, such as its restrictive covariance matrix and its direct proportionality between its mean and variance. In this work, a generalization over the Dirichlet distribution, namely the Nested Dirichlet distribution, is introduced in the context of finite mixture model providing more flexibility and overcoming the mentioned drawbacks, thanks to its hierarchical structure. The model learning is based on the generalized expectation-maximization algorithm, where parameters are initialized with the method of moments and estimated through the iterative Newton-Raphson method. Moreover, the minimum message length criterion is proposed to determine the best number of components that describe the data clusters by the finite mixture model. The Nested Dirichlet distribution is proven to be part of the exponential family, which offers several advantages, such as the calculation of several probabilistic distances in closed forms. The performance of the Nested Dirichlet mixture model is compared to the Dirichlet mixture model, the generalized Dirichlet mixture model, and the Convolutional Neural Network as a deep learning network. The excellence of the powerful proposed framework is validated through this comparison via challenging datasets. The hierarchical feature of the model is applied to real-world challenging tasks such as hierarchical cluster analysis and hierarchical feature learning, showing a significant improvement in terms of accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 6

Variational learning of hierarchical infinite generalized Dirichlet mixture models and applications

Article 16 December 2014

Variational inference and sparsity in high-dimensional deep Gaussian mixture models

Article Open access 01 September 2022

Variational Learning of Finite Inverted Dirichlet Mixture Models and Applications

References

Uci machine learning repository: Optical recognition of handwritten digits data set. URL https://archive.ics.uci.edu/ml/datasets/optical%2Brecognition%2Bof%2Bhandwritten%2Bdigits
Alalyan F, Zamzami N, Bouguila N (2019) Model-based hierarchical clustering for categorical data. In 2019 IEEE 28th International Symposium on Industrial Electronics (ISIE), pp 1424–1429. IEEE
Alzubaidi Laith, Zhang Jinglan, Humaidi Amjad J, Al-Dujaili Ayad, Duan Ye, Al-Shamma Omran, Santamaría José, Fadhel Mohammed A, Al-Amidie Muthana, Farhan Laith (2021) Review of deep learning: concepts, cnn architectures, challenges, applications, future directions. J Big Data 8:1–74
Article Google Scholar
Azam M, Bouguila N (2016) Speaker classification via supervised hierarchical clustering using ica mixture model. In International Conference on Image and Signal Processing, pp 193–202. Springer
Baxter RA (2010) Minimum Message Length, pp 668–674. Springer US, Boston, MA. ISBN 978-0-387-30164-8. https://doi.org/10.1007/978-0-387-30164-8_542
Nizar B (2008) Clustering of count data using generalized dirichlet multinomial distributions. IEEE Trans Knowl Data Eng 20(4):462–474
Article Google Scholar
Nizar B (2012) Hybrid generative/discriminative approaches for proportional data modeling and classification. IEEE Trans Knowl Data Eng 24(12):2184–2202. https://doi.org/10.1109/TKDE.2011.162
Article Google Scholar
Nizar B (2012) Hybrid generative/discriminative approaches for proportional data modeling and classification. IEEE Trans Knowl Data Eng 24(12):2184–2202
Article Google Scholar
Nizar B (2013) Deriving kernels from generalized dirichlet mixture models and applications. Inform Process Manag 49(1):123–137
Article Google Scholar
Nizar B, Ola A (2009) A discrete mixture-based kernel for svms: Application to spam and image categorization. Inform Process Manag 45(6):631–642
Article Google Scholar
Bouguila N, Fan W (2020) Mixture models and applications. Springer
Bouguila N, Ziou D (2005) Mml-based approach for high-dimensional unsupervised learning using the generalized dirichlet mixture. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05)-Workshops, pp 53–53. IEEE
Nizar B, Djemel Z (2007) High-dimensional unsupervised selection and estimation of a finite generalized dirichlet mixture model based on minimum message length. IEEE Trans Pattern Anal Mach Intell 29(10):1716–1731
Article Google Scholar
Nizar B, Djemel Z, Jean V (2004) Unsupervised learning of a finite mixture model based on the dirichlet distribution and its application. IEEE Trans Image Process 13(11):1533–1543
Article Google Scholar
Bourouis S, Bouguila N (2023) Expectation propagation learning of finite and infinite gamma mixture models and its applications. Multimed Tool Appl, pp 1–18
Sami B, Atef Z, Nizar B, Roobaea A (2018) Deriving probabilistic svm kernels from flexible statistical mixture models and its application to retinal images classification. IEEE Access 7:1107–1117
Google Scholar
Bourouis S, Alharbi A, Bouguila N (2021a) Bayesian learning of shifted-scaled dirichlet mixture models and its application to early covid-19 detection in chest x-ray images. J Imag, 7(1). ISSN 2313-433X. https://doi.org/10.3390/jimaging7010007. https://www.mdpi.com/2313-433X/7/1/7
Bourouis S, Alharbi A, Bouguila N (2021b) Bayesian learning of shifted-scaled dirichlet mixture models and its application to early covid-19 detection in chest x-ray images. J Imag, 7(1). ISSN 2313-433X. https://doi.org/10.3390/jimaging7010007. https://www.mdpi.com/2313-433X/7/1/7
Campbell M, Galligan M, Saldova R, Rudd P, Murphy T (2011) Application of compositional models for glycan hilic data. 01
Bob C (2010) Integrating out multinomial parameters in latent dirichlet allocation and naive bayes for collapsed gibbs sampling. Rapport Technique 4:464
Google Scholar
Chen J, Gong Z, Liu W (2019) A nonparametric model for online topic discovery with word embeddings. Inform Sci, 504:32–47. ISSN 0020-0255. https://doi.org/10.1016/j.ins.2019.07.048. https://www.sciencedirect.com/science/article/pii/S0020025519306541
Junyang C, Zhiguo G, Weiwen L (2020) A dirichlet process biterm-based mixture model for short text stream clustering. Appl Intell 50:1609–1619
Article Google Scholar
Connor RJ, Mosimann JE (1969) Concepts of independence for proportions with a generalization of the dirichlet distribution. J Amer Stat Assoc 64(325):194–206
Article MathSciNet MATH Google Scholar
Samuel Y, Dennis III (1991) On the hyper-dirichlet type 1 and hyper-liouville distributions. Communications in Statistics-Theory and Methods 20(12):4069–4081
Article MathSciNet MATH Google Scholar
Elise E, Nizar B (2016) Proportional data modeling with hidden markov models based on generalized dirichlet and beta-liouville mixtures applied to anomaly detection in public areas. Pattern Recogn 55:125–136
Article Google Scholar
Fisher RA. Iris dataset. https://archive-beta.ics.uci.edu/ml/datasets/iris
Genovese CR, Larry W (2000) Rates of convergence for the Gaussian mixture sieve. Annal Stat 28(4):1105–1127
Article MathSciNet MATH Google Scholar
Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT press
Graybill FA (1983) Matrices with applications in statistics. p 461
Guo Z, Wang ZJ (2012) An unsupervised hierarchical feature learning framework for one-shot image recognition. IEEE Trans Multimed 15(3):621–632
Article Google Scholar
Hamdi M, Hilali-Jaghdam I, Elnaim BE, Elhag AA (2023) Forecasting and classification of new cases of covid 19 before vaccination using decision trees and gaussian mixture model. Alexandria Eng J, 62:327–333. ISSN 1110-0168. https://doi.org/10.1016/j.aej.2022.07.011. https://www.sciencedirect.com/science/article/pii/S111001682200463X
Harris C, Stephens M et al (1988) A combined corner and edge detector. In Alvey vision conference, vol 15, pp 10–5244. Citeseer
Ihou KE, Bouguila N (2017) A new latent generalized dirichlet allocation model for image classification. In 2017 Seventh International Conference on Image Processing Theory, Tools and Applications (IPTA), pp 1–6. IEEE
Ishak S, Chowdary AJ (2021) Evaluating robustness of a cnn architecture introduced to the adversarial attacks
Janosi A, Steinbrunn W, Pfisterer, Robert, Detrano MD (1988) Heart Disease. UCI Machine Learning Repository
Tony J, Risi K, Andrew H (2004) Probability product kernels. J Mach Learn Res 5:819–844
MathSciNet MATH Google Scholar
Jefferys WH, Berger JO (1992) Ockham’s razor and bayesian analysis. American Scientist 80(1):64–72
Google Scholar
Qiang J, Yanfeng S, Junbin G, Hu Y, Baocai Y (2022) A decoder-free variational deep embedding for unsupervised clustering. IEEE Trans Neural Netw Learn Syst 33(10):5681–5693. https://doi.org/10.1109/TNNLS.2021.3071275
Article MathSciNet Google Scholar
Zhuxi J, Yin Z, Huachun T, Bangsheng T, Zhou H (2017) Variational deep embedding: an unsupervised and generative approach to clustering,
Kavitha R, Jothi DK, Saravanan K, Swain MP, Gonzáles JLA, Bhardwaj RJ, Adomako E et al (2023) Ant colony optimization-enabled cnn deep learning technique for accurate detection of cervical cancer. BioMed Res Int, 2023
Kondor R, Jebara T (2003) A kernel between sets of vectors. In Proceedings of the 20th international conference on machine learning (ICML-03), pp 361–368
Koslovsky MD, Vannucci M (2020) Microbvs: Dirichlet-tree multinomial regression models with bayesian variable selection-an r package. BMC Bioinformatics 21:1–10
Google Scholar
Krizhevsky A, Nair V, Hinton G. Cifar-10 (canadian institute for advanced research). http://www.cs.toronto.edu/ kriz/cifar.html
Kullback S, Leibler RA (1951) On information and sufficiency. The Annal Math Stat 22(1):79–86
Article MathSciNet MATH Google Scholar
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06), vol 2, pp 2169–2178. IEEE
Liu S, Huang D, Wang Y (2019) Adaptive nms: Refining pedestrian detection in a crowd. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6459–6468
Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Article Google Scholar
Shanxiang L, Zheng W, Cong L, Hao C (2022) Better lattice quantizers constructed from complex integers. IEEE Trans Commun 70(12):7932–7940
Article Google Scholar
Mangaroska K, Martinez-Maldonado R, Vesin B, Gašević D (2021) Challenges and opportunities of multimodal data in human learning: The computer science students’ perspective. J Comput Assisted Learn, 37(4):1030–1047. https://doi.org/10.1111/jcal.12542
Narges M, Nizar B (2019) A probabilistic approach based on a finite mixture model of multivariate beta distributions. ICEIS 1:373–380
Google Scholar
Masoudnia S, Mersa O, Araabi BN, Vahabie A-H, Sadeghi MA, Ahmadabadi MN (2019) Multi-representational learning for offline signature verification using multi-loss snapshot ensemble of cnns. Expert Syst Appl, 133:317–330. ISSN 0957-4174. https://doi.org/10.1016/j.eswa.2019.03.040. https://www.sciencedirect.com/science/article/pii/S0957417419301666
McLachlan GJ, Lee SX, Rathnayake SI (2019) Finite mixture models. Annual Rev Stat Appl 6(1):355–378
Article MathSciNet Google Scholar
McLachlan GJ, Lee SX, Rathnayake SI (2019) Finite mixture models. Annual Rev Stat Appl 6(1):355–378. https://doi.org/10.1146/annurev-statistics-031017-100325
Article MathSciNet Google Scholar
McLachlan GJ, Lee SX, Rathnayake SI (2019) Finite mixture models. Annual Rev Stat Appl 6(1):355–378. https://doi.org/10.1146/annurev-statistics-031017-100325
Article MathSciNet Google Scholar
Minka TP (2000) Estimating a dirichlet distribution. https://tminka.github.io/papers/dirichlet/minka-dirichlet.pdf
Minka T (1999) The dirichlet-tree distribution. https://www.microsoft.com/en-us/research/publication/dirichlet-tree-distribution/
Monti GS, Mateu-Figueras G, Pawlowsky-Glahn V (2011) Notes on the Scaled Dirichlet Distribution, chapter 10, pp 128–138. John Wiley & Sons, Ltd. ISBN 9781119976462. https://doi.org/10.1002/9781119976462.ch10
Saeed M, Tarek Z, Fuzhan N, Farzaneh G (2020) Automated anomaly detection and localization in sewer inspection videos using proportional data modeling and deep learning-based text recognition. J Infrastructure Syst 26(3):04020018. https://doi.org/10.1061/(ASCE)IS.1943-555X.0000553
Article Google Scholar
Joakim M, Lars M, Jesper R (2017) Polynomial probability distribution estimation using the method of moments. PloS one 12(4):e0174573
Article Google Scholar
Najar F, Bouguila N (2020) Image categorization using agglomerative clustering based smoothed dirichlet mixtures. In International Symposium on Visual Computing, pp 27–38. Springer
Najar F, Bouguila N (2022) Emotion recognition: A smoothed dirichlet multinomial solution. Eng Appl Artif Intell, 107:104542. ISSN 0952-1976. https://doi.org/10.1016/j.engappai.2021.104542. https://www.sciencedirect.com/science/article/pii/S0952197621003900
Rim N, Manar A, Nizar B (2020) A novel approach for modeling positive vectors with inverted dirichlet-based hidden markov models. Knowledge-Based Syst 192:105335
Article Google Scholar
Null B (2008) The nested dirichlet distribution: properties and applications. 11
Null B (2009) Modeling baseball player ability with a nested dirichlet distribution. J Quantitative Anal Sports 5:5–5
MathSciNet Google Scholar
Oboh BS, Bouguila N (2017) Unsupervised learning of finite mixtures using scaled dirichlet distribution and its application to software modules categorization. In 2017 IEEE international conference on industrial technology (ICIT), pp 1085–1090. IEEE
Ombabi AH, Ouarda W, Alimi AM (2020) Deep learning cnn-lstm framework for arabic sentiment analysis using textual information shared in social networks. Social Netw Anal Mining 10:1–13
Google Scholar
Palaz D, Magimai-Doss M, Collobert R (2019) End-to-end acoustic modeling using convolutional neural networks for hmm-based automatic speech recognition. Speech Commun, 108:15–32. ISSN 0167-6393. https://doi.org/10.1016/j.specom.2019.01.004. https://www.sciencedirect.com/science/article/pii/S0167639316301625
Patrício M, Caramelo F, Seiça R, Matafome P, Crisóstomo J, Pereira J. Breast cancer coimbra data set. https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Coimbra
Cinelli LP, Marins MA, Barros da Silva EA, Netto SL (2021) Variational Autoencoder, pp 111–149. Springer International Publishing, Cham
Rahman MH, Bouguila N (2021) Efficient feature mapping in classifying proportional data. IEEE Access 9:3712–3724. https://doi.org/10.1109/ACCESS.2020.3047536
Article Google Scholar
Jorma R (1978) Modeling by shortest data description. Automatica 14(5):465–471
Article Google Scholar
Roberts SJ, Dirk H, Iead R, William P (1998) Bayesian approaches to gaussian mixture modeling. IEEE Trans Pattern Anal Mach Intell 20:1133–1142
Article Google Scholar
Gerd R (1989) Maximum likelihood estimation of dirichlet distributions. J Stat Comput Simul 32(4):215–221
Sakamoto Y, Ishiguro M, Kitagawa G (1986) Akaike information criterion statistics. Dordrecht, The Netherlands: D. Reidel, 81(10.5555):26853
Mahsa S, Ralf W (2021) Investment decisions with endogeneity: A dirichlet tree analysis. J Risk Financial Manag 14(7):299
Article Google Scholar
Sharma P, Berwal YPS, Ghai W (2020) Performance analysis of deep learning cnn models for disease detection in plants using image segmentation. Inform Process Agriculture, 7(4):566–574. ISSN 2214-3173. https://doi.org/10.1016/j.inpa.2019.11.001. https://www.sciencedirect.com/science/article/pii/S2214317319301957
Singh JP, Bouguila N (2018) Intrusion detection using unsupervised approach. In Emerging Technologies for Developing Countries: First International EAI Conference, AFRICATEK 2017, Marrakech, Morocco, March 27-28, 2017 Proceedings 1st, pp 192–201. Springer
Singhal A, Singh P, Lall B, Joshi SD (2020) Modeling and prediction of covid-19 pandemic using gaussian mixture model. Chaos, Solitons Fractals 138:110023
Article MathSciNet Google Scholar
Wallace CS (2005) Statistical and inductive inference by minimum message length. Springer
Wang T, Zhao H (2021) Statistical methods for analyzing tree-structured microbiome data. Statistical Analysis of Microbiome Data, pp 193–220
Weakliem DL (1999) A critique of the bayesian information criterion for model selection. Sociological Method Res 27(3):359–397
Article Google Scholar
Yang L, Fan W, Bouguila N (2021) Deep clustering analysis via dual variational autoencoder with spherical latent embeddings. IEEE Transactions on Neural Networks and Learning Systems, pp 1–10. https://doi.org/10.1109/TNNLS.2021.3135460
Lin Y, Wentao F, Nizar B (2022) Clustering analysis via deep generative models with mixture models. IEEE Trans Neural Netw Learn Syst 33(1):340–350. https://doi.org/10.1109/TNNLS.2020.3027761
Article MathSciNet Google Scholar
Yin H (2010) Scene classification using spatial pyramid matching and hierarchical dirichlet processes
Yin J, Chao D, Liu Z, Zhang W, Yu X, Wang J (2018) Model-based clustering of short text streams. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ’18, pp 2634–2642, New York, NY, USA. Association for Computing Machinery. ISBN 9781450355520. https://doi.org/10.1145/3219819.3220094
Nuha Z, Nizar B (2019) Model selection and application to high-dimensional count data clustering. Appl Intell 49(4):1467–1488
Article Google Scholar
Zamzami N, Bouguila N (2019b) A novel scaled dirichlet-based statistical framework for count data modeling: Unsupervised learning and exponential approximation. Pattern Recogn, 95:36–47. ISSN 0031-3203. https://doi.org/10.1016/j.patcog.2019.05.038. https://www.sciencedirect.com/science/article/pii/S0031320319302237
Nuha Z, Nizar B (2019) Hybrid generative discriminative approaches based on multinomial scaled dirichlet mixture models. Appl Intell 49(11):3783–3800. https://doi.org/10.1007/s10489-019-01437-0
Article Google Scholar
Zamzami N, Bouguila N (2019d) An accurate evaluation of msd log-likelihood and its application in human action recognition. In 2019 IEEE Global Conference on Signal and Information Processing (GlobalSIP), pp 1–5. https://doi.org/10.1109/GlobalSIP45357.2019.8969324
Nuha Z, Nizar B (2020) High-dimensional count data clustering based on an exponential approximation to the multinomial beta-liouville distribution. Inform Sci 524:116–135
Article MathSciNet MATH Google Scholar
Ziou D, Bouguila N (2004) Unsupervised learning of a finite gamma mixture using MML: application to SAR image analysis. In 17th International Conference on Pattern Recognition, ICPR 2004, Cambridge, UK, pp 68–71. IEEE Computer Society

Download references

Funding

No funding was received to assist with the preparation of this manuscript.

Author information

Authors and Affiliations

Concordia Institute for Information Systems Engineering, Concordia University, Montréal, QC, Canada
Fares Alkhawaja & Nizar Bouguila

Authors

Fares Alkhawaja
View author publications
You can also search for this author in PubMed Google Scholar
Nizar Bouguila
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fares Alkhawaja.

Ethics declarations

Conflicts of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: The connection between NDD and Dirichlet-tree distributions

The derivation of the PDF of both the NDD [63] and Dirichlet-tree distributions [56] is based on the distribution introduced by [24]. Both distributions are solely based on the Dirichlet distribution being represented as a finite stochastic process with more generalization to the covariance.

1.1 A.1 Derivation of NDD

The NDD can be viewed as a representation of the original variables in the Dirichlet distribution in addition to the nesting variables where each nesting follows a Dirichlet distribution as well. Therefore, the NDD can be directly derived from the PDF of the Dirichlet distribution. Thus, ($x_{(d+1)},x_{(d+2)},\ldots ,x_{(d+k)}$) is determined by ($x_1,x_2,\ldots ,x_d$).

Consequently,

$$\begin{aligned} {\begin{matrix} &{}P(x_1,x_2,\ldots ,x_{(d+K)})=\\ {} &{}P(x_1,x_2,\ldots ,x_d\ ) P(x_{(d+1)},x_{(d+2)},\ldots ,x_{(d+K)}\mid x_1,x_2,\ldots ,x_d)=\\ {} &{}P(x_1,x_2,\ldots ,x_d) \end{matrix}} \end{aligned}$$

(A1)

By denoting $X_0\sim \mathcal {D}(\alpha _{0})$ for the unnested variables (root node variables) and $X_k \quad \forall k>0$, where $X_k=\tilde{X}_kx_n+k$ given that $\tilde{X}_k\sim \mathcal {D}(\alpha _{k})$, the following expression can be written:

$$\begin{aligned} P(x_1,x_2,\ldots ,x_d)= & {} P(X_0,X_1,\ldots ,X_K)\nonumber \\= & {} P(X_0)P(X_1,\ldots ,X_K\mid X_0)\nonumber \\{} & {} \textit{Using chain-rule}\nonumber \\= & {} P(X_0)f(X_1\mid X_0) P(X_2\mid X_0,X_1)\nonumber \\{} & {} \dots P(X_K\mid X_0,X_1,\dots X_{K-1})\nonumber \\{} & {} \textit{Using above notations}\nonumber \\= & {} P(X_0)P(X_1\mid x_{d+1})P(X_2\mid x_{d+2})\nonumber \\{} & {} \dots P(X_K \mid x_{d+K}) \end{aligned}$$

(A2)

Due to the change in the variable,

$$\begin{aligned} {\begin{matrix} P(X_k\mid x_{d+k})&{}=P(\tilde{X}_kx_n+k\mid x_n+k)|\textbf{J}_{X_k \longrightarrow \tilde{X}_k}|\\ {} &{}=P(\tilde{X}_k)|\textbf{J}_{X_k \longrightarrow \tilde{X}_k}|\end{matrix}} \end{aligned}$$

(A3)

where $|\textbf{J}_{X_k \longrightarrow \tilde{X}_k}|$ is the determinant of Jacobian matrix derived in [23]. The Jacobian matrix is added as a result of the change of variable rule. According to [23],

$$\begin{aligned} |\textbf{J}_{X_k \longrightarrow \tilde{X}_k}|=\begin{bmatrix} \frac{1}{x_{d+k}} &{}0 &{} \cdots &{} 0 \\ 0 &{}\frac{1}{x_{d+k}} &{} \cdots &{} 0 \\ \vdots &{} \vdots &{}\ddots &{} \vdots \\ 0 &{}0 &{} \cdots &{} \frac{1}{x_{d+k}} \end{bmatrix}=\left( \frac{1}{x_{d+k}}\right) ^{|X_j|-1} \end{aligned}$$

(A4)

Therefore,

$$\begin{aligned} P(X_k\mid x_{n+k})=P(\tilde{X}_k)\left( \frac{1}{x_{d+k}}\right) ^{|X_j|-1} \end{aligned}$$

(A5)

$$\begin{aligned} {\begin{matrix} &{}P(x_1,x_2,\ldots ,x_n)=P(X_0)\prod _{k=1}^KP(\tilde{X}_k)\left( \frac{1}{x_{d+k}}\right) ^{|X_k|-1}\\ &{}=\frac{1}{B\left( A_0\right) }\left( \prod _{j \in I_0} x_j^{\alpha _j-1}\right) \left( \prod _{k=1}^K \frac{1}{B\left( A_k\right) }\left( \prod _{j \in I_k}\left( \frac{x_j}{x_{d+k}}\right) ^{\alpha _j-1}\right) {\frac{1}{x_{d+k}}}^{|x_k|-1}\right) \\ &{}=\left( \prod _{k=0}^K \frac{1}{B\left( A_k\right) }\right) \left( \prod _{j=1}^{d+K}\left( x_j^{\alpha _j-1}\right) \right) \left( \prod _{k=1}^K \frac{1}{x_{d+k}} ^{\sum _{j \in I_k}{\alpha _j-1}}\right) \\ &{}=\left( \prod _{k=0}^K \frac{1}{B\left( A_k\right) }\right) \left( \prod _{j=1}^{d+K}\left( x_j^{\alpha _j-1}\right) \right) \left( \prod _{k=1}^K \frac{1}{x_{d+k}} ^{\bar{A}_k-1}\right) \\ &{}=\left( \prod _{k=0}^K \frac{1}{B\left( A_k\right) }\right) \left( \prod _{j=1}^{d+K}\left( x_j^{\alpha _j-1}\right) \right) \left( \prod _{k=1}^K x_{d+k}^{1-\bar{A}_k}\right) \\ &{}=\left( \prod _{k=0}^K \frac{1}{B\left( A_k\right) }\right) \left( \prod _{j=1}^d\left( x_j^{\alpha _j-1}\right) \right) \left( \prod _{k=1}^K x_{d+k} ^{\alpha _{d+k}-\bar{A}_k}\right) \\ &{}=\frac{\left( \prod _{j=1}^d x_j{ }^{\alpha _j-1}\right) \left( \prod _{k=1}^K x_{d+k}{ }^{\alpha _{d+k}-\bar{A}_k}\right) }{\prod _{k=0}^K B\left( A_k\right) } \end{matrix}} \end{aligned}$$

(A6)

1.2 A.2 Derivation of Dirichlet-tree distribution

In [56], the function ($\delta $) is introduced to control the contribution of each branch to the associated nodes as illustrated in this section.

The probability for a certain sample (x) for a set of leaf probabilities $(p_1\dots p_d)$ can be formulated as:

$$\begin{aligned} P(x\mid p_1\dots p_d)=\prod _{j=1}^d{p_j}^{\delta (x-j)} \end{aligned}$$

(A7)

where $P(p_1\dots p_d\mid \alpha ) \sim \mathcal {D}(\alpha _{j})$.

Analogously, the leaf probabilities in the Dirichlet-tree distribution for a given tree T for K nodes and B branch weights and C branches can be represented as

$$\begin{aligned} P(x\mid B,T)=\prod _{k=1}^K\prod _{c=1}^Cb^{\delta _{kc}(x)}_{kc} \end{aligned}$$

(A8)

where,

$$\begin{aligned} b_{kc}=\frac{\sum _{j=1}^d\delta _{kc}p_j}{\sum _{j=1}^d\delta _{k}p_j} \end{aligned}$$

(A9)

and,

$$\begin{aligned} \delta _{kc}={\left\{ \begin{array}{ll} 1,&{} \text {if branch { kc} leads to { x}}\\ 0,&{} \text {otherwise} \end{array}\right. } \end{aligned}$$

(A10)

Similar to the NDD, the interior nodes are defining another Dirichlet distribution for the descendant branches as shown in (A5).

$$\begin{aligned} P(B\mid \alpha )=\prod _{k=1}^KP(b_j\mid \alpha ) \end{aligned}$$

(A11)

where $P(b_j\mid \alpha )\sim \mathcal {D}(\alpha _{kc})$.

By combining (A2) with (A5) and multiplying by the Jacobian introduced by [23] in [24] the PDF of the Dirichlet-tree distribution can be written as:

$$\begin{aligned} P(x_1\dots x_d \mid \alpha ,T)=\prod _{j=1}^dx^{\alpha _{parent{(j)}}-1}_j\prod _{k=1}^K\frac{\Gamma (\sum _{c=1}^C\alpha _{kc})}{\prod _{c=1}^C{\Gamma (\alpha _{kc})}}\left( \sum _{j=1}^d\sum _{c=1}^C\delta _{kc(j)}x_j\right) ^{\beta _{k}} \end{aligned}$$

(A12)

where,

$$\begin{aligned} \beta _{k}={\left\{ \begin{array}{ll} 0,&{} \text {if { k} is the root node}\\ \alpha _{parent(k)-\sum _{c=1}^C\alpha _{kc}},&{} \text {otherwise} \end{array}\right. } \end{aligned}$$

(A13)

In (A7), the consideration of the tree is given from the original variables upwards, resulting in the consideration of all the variables in one equation. Therefore, instead of dealing with each nesting independently as shown in the NDD derivation, ($\delta _{kc}$) determines which iteration is accepted or removed depending on ($\varvec{\delta }$); a 3D matrix that considers nodes, branches, and original variables. Therefore, ($\varvec{\delta }$) needs to be defined for all the original variables in advance. Moreover, any change to the tree structure would need to modify the ($\varvec{\delta }$) matrix.

Appendix B: Dirichlet-tree Mixture Model

1.1 B.1 Derivation of DTMM

In order to formulate the Dirichlet-tree distribution PDF, the tree shown in Fig. 1 needs to be redefined as in Fig. 14, to show the similarity between the NDD and the Dirichlet-tree distribution. The PDF shown in (B14) is the same as the one used in (A12), where $\bar{\alpha }_j=\alpha _{parent(j)}$.

$$\begin{aligned} P(X\mid \alpha )=\prod _{j=1}^dx^{\bar{\alpha }_j-1}_j\prod _{k=1}^K\dfrac{\Gamma (\sum _{c}^C\alpha _{kc})}{\prod _{c=1}^C\Gamma (\alpha _{kc})}\left( \sum _{j=1}^d\sum _{c=1}^C\delta _{kc}(j)x_j\right) ^{\beta _{k}} \end{aligned}$$

(B14)

where ($\delta _{kc}$) and ($\beta _{k}$) are defined in Eqs. A10 and A13, respectively. Accordingly,

$$\begin{aligned} P(X\mid \theta )=\sum _{m=1}^{M}P(X\mid \vec {\alpha }_m)P(m), \end{aligned}$$

(B15)

where,

$$\begin{aligned} \theta =\{\vec {\alpha }_1\ldots ,\vec {\alpha }_M,P(1),\ldots ,P(M)\}, \end{aligned}$$

and,

$$\begin{aligned} \vec {\alpha }_m=(\alpha _{kcm},\ldots , \alpha _{KCm}). \end{aligned}$$

Thus, for (N) independent observations $\mathcal {X}=\{X_i,\ldots ,X_N\}$

$$\begin{aligned} P(\mathcal {X}\mid \theta )=\prod _{i=1}^{N} {\sum _{m=1}^{M}P(X_i\mid \vec {\alpha }_m)P(m)}. \end{aligned}$$

(B16)

Accordingly, the likelihood shown in (B17) can be formulated for the two conditions shown in (A13).

$$\begin{aligned} \mathcal {L}(\mathcal {X},\mathcal {Z}\mid \theta )=\sum _{i=1}^{N} \sum _{m=1}^{M} z_{im}\log {[P(X_i\mid \vec {\alpha }_m)P(m)]}. \end{aligned}$$

(B17)

Based on Fig. 14, for root nodes $(k=1)$:

$$\begin{aligned} \mathcal {L}(\mathcal {X},\mathcal {Z}\mid \theta )= & {} \sum _{i=1}^{N}\sum _{m=1}^{M}z_{im}\Bigg [\sum _{j=1}^d\Bigg [\bar{\alpha }_{j} \log {x_j}-\log {x_j} \nonumber \\{} & {} +\sum _{k=1}^K\Bigg [\log {\Gamma (\sum _{c=1}^C\alpha _{kc})}-\sum _{c=1}^C\log {\Gamma (\alpha _{kc})}\Bigg ]\Bigg ]\nonumber \\{} & {} +\log {P(m)}\Bigg ] \end{aligned}$$

(B18)

For non-root nodes $(k=[2,3])$:

$$\begin{aligned} \mathcal {L}(\mathcal {X},\mathcal {Z}\mid \theta )= & {} \sum _{i=1}^N\sum _{m=1}^Mz_{im}\Bigg [\sum _{j=1}^d\Bigg [ \bar{\alpha }_{j} \log {x_j}-\log {x_j} \nonumber \\{} & {} +\sum _{k=1}^K\Bigg [\log {\Gamma (\sum _{c=1}^C\alpha _{kc})}-\sum _{c=1}^C\log {\Gamma (\alpha _{kc})}\nonumber \\{} & {} +\bar{\alpha }_k\log {\Bigg (\sum _{j=1}^d\sum _{c=1}^C\delta _{kc}(j)x_j\Bigg )} \\{} & {} -\sum _{c=1}^C\alpha _{kc}\log {\Bigg (\sum _{j=1}^d\sum _{c=1}^C\delta _{kc}(j)x_j\Bigg )}\Bigg ]\Bigg ]\nonumber \\{} & {} +\log {P(m)}\Bigg ] \nonumber \end{aligned}$$

(B19)

From the above equations, the relation of the ($\alpha _{kc}$) and ($\bar{\alpha }_k$) can be observed, as $\alpha _{kc} = \bar{\alpha }_k$ at specific (k). This is determined by the tree structure, hence, the use of ($\delta _{kc}$) is important to define this relationship. For example, according to Fig. 14, $\alpha _{11} = \bar{\alpha }_2$. This makes each tree structure distinct from the other structures. Therefore, each structure requires its own model derivation. In this paper, the derivations are made with respect to the tree shown in Fig. 14. To estimate the parameters for the EM method, the Newton-Raphson method in (26) is used for the distribution parameters, while (25) is used to define the prior parameters. The parameter estimation for non-root nodes $(k=[2,3],c=[1,2])$ is required for ($\alpha _{kc}$) and ($\bar{\alpha }_k$) while using deterministic annealing extension:

$$\begin{aligned} \frac{\partial \Theta (\theta ^*,\theta ,T)}{\partial \alpha _{kcm}^{(t)}}= & {} \sum _{i=1}^Nz_{im}\Bigg [\log {x_{kc}}+\Psi (\sum _{c=1}^C\alpha _{kcm})\nonumber \\{} & {} -\Psi (\alpha _{kcm})\Bigg ] \end{aligned}$$

(B20)

$$\begin{aligned} \frac{\partial \Theta (\theta ^*,\theta ,T)}{\partial \bar{\alpha }_{km}^{(t)}}=\sum _{i=1}^Nz_{im}\left( \sum _{j=1}^d\sum _{c=1}^C\delta _{kc}(j)x_j\right) \end{aligned}$$

(B21)

while, the parameter estimation for non-root nodes $(k=1,c=[1,2])$ is only required for $\alpha _{kc}$ as there are no parent nodes.

$$\begin{aligned} \frac{\partial \Theta (\theta ^*,\theta ,T)}{\partial \alpha _{kcm}^{(t)}}=\sum _{i=1}^Nz_{im}\left[ \Psi (\sum _{c=1}^C\alpha _{kcm})-\Psi (\alpha _{kcm})\right] . \end{aligned}$$

(B22)

However, the parent node for k=[2,3] is the actual node for k=1. Therefore, (B21) and (B22) can be combined for $\alpha _{kc}: \forall k \in 1$. Thus:

$$\begin{aligned} \frac{\partial \Theta (\theta ^*,\theta ,T)}{\partial \alpha _{kcm}^{(t)}}:\forall k \in [2,3]= & {} \sum _{i=1}^Nz_{im}\Bigg [\log {x_{kc}}+\Psi (\sum _{c=1}^C\alpha _{kcm})\nonumber \\{} & {} -\Psi (\alpha _{kcm})\Bigg ] \end{aligned}$$

(B23)

$$\begin{aligned} \frac{\partial \Theta (\theta ^*,\theta ,T)}{\partial \alpha _{kcm}^{(t)}}:\forall k \in [1]= & {} \sum _{i=1}^Nz_{im}\Bigg [\Bigg (\sum _{j=1}^d\sum _{c=1}^C\delta _{kc}(j)x_j\Bigg ) \nonumber \\{} & {} +\Psi (\sum _{c=1}^C\alpha _{kcm})-\Psi (\alpha _{kcm})\Bigg ] \end{aligned}$$

(B24)

To define the Hessian matrix,

$$\begin{aligned} \frac{\partial ^2 \Theta (\theta ^*,\theta ,T)}{\partial \alpha _{kc_1m}^{(t)} \alpha _{kc_2m}^{(t)}} = \sum _{i=1}^Nz_{im}\left[ \Psi ^\prime (\sum _{c=1}^C\alpha _{kcm})\right] \end{aligned}$$

(B25)

$$\begin{aligned} \frac{\partial ^2 \Theta (\theta ^*,\theta ,T)}{\partial {\alpha _{kc_m}^{(t)}}^2} = \sum _{i=1}^Nz_{im}\left[ \Psi ^\prime (\sum _{c=1}^C\alpha _{kcm})-\Psi ^\prime (\alpha _{kcm})\right] \end{aligned}$$

(B26)

Finally, $\delta _{kc}(j)$ is a ($K\times C\times d$) matrix, that is defined based on the tree structure. Therefore, based on Fig. 14, $\delta _{kc}(j)$ is a ($3\times 2\times 4$) matrix, and can be written as:

$$\begin{aligned} \delta (1)=\begin{bmatrix} 1 &{} 0 \\ 1 &{} 0 \\ 0 &{} 0 \end{bmatrix} , \delta (2)=\begin{bmatrix} 1 &{} 0 \\ 0 &{} 1 \\ 0 &{} 0 \end{bmatrix} , \delta (3)=\begin{bmatrix} 0 &{} 1 \\ 0 &{} 0 \\ 1 &{} 0 \end{bmatrix} , \delta (4)=\begin{bmatrix} 0 &{} 1 \\ 0 &{} 0 \\ 0 &{} 1 \end{bmatrix} \end{aligned}$$

(B27)

1.2 B.2 Results verification

This section shows the results for the NDMM and DTMM compared to the DMM and the original values under the same initialization hyperparameters. In Table 13, it is shown that NDMM and DTMM gave the same results, which proves that they are similar to each other.

Table 12 Comparison between the yielded $\alpha $ parameters for the DMM, NDMM and DTMM

Full size table

Appendix C: Proof of exponential properties of NDD

In this section, the pertinence of the NDD to the exponential family, along with its properties to use the Bhattacharyya kernel and Kullback-Leibler divergence.

1.1 C. 1 Proof of exponential pertinence

First, the exponential of the natural logarithm is taken for the PDF of the NDD as shown in (C28).

$$\begin{aligned} \begin{aligned} P(X\mid \theta )&=\exp {[\log P(X\mid \theta )]} \\&=\frac{\prod _{j=1}^d x_j^{\alpha _j-1} \prod _{k=1}^k x_{d+k}^{\alpha _{d+k}-\bar{A}_k}}{\prod _{k=0}^k \frac{\prod _{j \in k} \Gamma \left( \alpha _j\right) }{\Gamma \left( \sum _{j \in k_k}^{\alpha _j}\right) }} \end{aligned} \end{aligned}$$

(C28)

To verify that the distribution belongs to the exponential family, it needs to be written in the form shown in (C29):

$$\begin{aligned} P(X\mid \theta )=\exp {[A(X)+\theta ^TT(X)-k(\theta )]} \end{aligned}$$

(C29)

where A is the measure function, T is the sufficient statistics, and K is the cumulant generating function [36]. Therefore, by expanding (C28),

$$\begin{aligned} P(X \mid \theta )= & {} \exp \Bigg [\sum _{j=1}^d\alpha _j\log {x_j}-\sum _{j=1}^d x_j+\sum _{k=1}^K\alpha _{d+k}\log {x_{d+k}}\nonumber \\{} & {} -\sum _{k=1}^K\bar{A}_k\log {x_{d+k}}+\sum _{k=0}^K\log {\Gamma (\sum _{j\in k}\alpha _{j})}\nonumber \\{} & {} -\sum _{k=0}^K\sum _{j\in k}\log {\Gamma (\alpha _j)} \Bigg ]\nonumber \\= & {} EXP\Bigg [\sum _{j=1}^d\alpha _j\log {x_j}-\sum _{j=1}^d x_j+\sum _{k=1}^K\alpha _{d+k}\log {x_{d+k}}\nonumber \\{} & {} -\sum _{k=1}^K\sum _{j\in k}\alpha _{d+j}\log {x_{d+k}}+\sum _{k=0}^K\log {\Gamma (\sum _{j\in k}\alpha _{j})}\nonumber \\{} & {} -\sum _{k=0}^K\sum _{j\in k}\log {\Gamma (\alpha _j)} \Bigg ] \end{aligned}$$

(C30)

Finally, (C30) can be written as:

$$\begin{aligned}{} & {} A(X)=-\sum _{j=1}^dx_j\nonumber \\{} & {} K(\theta )=\sum _{k=0}^K\log {\Gamma (\sum _{j\in k}\alpha _{j})}-\sum _{k=0}^K\sum _{j\in k}\log {\Gamma (\alpha _j)}\nonumber \\{} & {} \theta =(\alpha _{1},\dots ,\alpha _{d},\alpha _{d+1},\dots , \alpha _{d+K},\bar{A}_1,\dots ,\bar{A}_K)\nonumber \\{} & {} \theta =(\alpha _{1},\dots ,\alpha _{d},\alpha _{d+1},\dots , \alpha _{d+K},\sum _{j\in k(1)}\alpha _{j}),\dots ,\sum _{j\in k(K)}\alpha _{j})\nonumber \\{} & {} T(X)=(\log {x_1},\dots ,\log {x_d},\log {x_{d+1}},\dots ,\log {x{d+K}},\nonumber \\{} & {} \log {x_{d+1}},\dots ,\log {x{d+K}}) \end{aligned}$$

(C31)

1.2 C.2 Bhattacharyya distance

$$\begin{aligned}{} & {} BH(P(X \mid \theta ),P(X \mid \theta ^\prime ))\nonumber \\{} & {} \quad =\exp {\left[ K\left( \dfrac{1}{2}\theta -\dfrac{1}{2}\theta ^\prime \right) -\dfrac{1}{2}K(\theta )-\dfrac{1}{2}K(\theta ^\prime )\right] } \end{aligned}$$

(C32)

$$\begin{aligned}{} & {} BH(P(X \mid \theta ),P(X \mid \theta ^\prime ))\nonumber \\= & {} \exp \left[ \right. \sum _{k=0}^K\log {\Gamma \left( \right. \sum _{j\in k}\frac{\alpha _j}{2}-\frac{{\alpha _j}^\prime }{2}}\left. \right) \left. \right] \nonumber \\{} & {} -\sum _{k=0}^K\sum _{j\in k}\log {\Gamma \left( \frac{\alpha _j}{2}-\frac{{\alpha _j}^\prime }{2}\right) }-\frac{1}{2}\sum _{k=0}^K\log {\Gamma (\sum _{j\in k}\alpha _j)}\nonumber \\{} & {} +\frac{1}{2}\sum _{k=0}^K\sum _{j\in k}\log {\Gamma (\alpha _j)}\nonumber \\{} & {} -\frac{1}{2}\sum _{k=0}^K\log {\Gamma (\sum _{j\in k}{\alpha _j}^\prime )}+\frac{1}{2}\sum _{k=0}^K\sum _{j\in k}\log {\Gamma ({\alpha _j}^\prime )}\nonumber \\= & {} \frac{\prod _{k=0}^K \Gamma \left( \sum _{j \in k} \frac{\alpha _j-{\alpha _j}^{\prime }}{2}\right) \sqrt{\prod _{k=0}^K \prod _{j \in k} \Gamma \left( \alpha _j\right) \Gamma \left( {\alpha _j}^{\prime }\right) }}{\prod _{k=0}^K\sum _{j\in k} \Gamma \left( \frac{\alpha _j-{\alpha _j}^{\prime }}{2}\right) \sqrt{\prod _{k=0}^K \Gamma \left( \sum _{j \in k}{\alpha _j}\right) \Gamma \left( \sum _{j \in k} {\alpha _j}^{\prime }\right) }} \end{aligned}$$

(C33)

1.3 Kullback-Leibler divergence

$$\begin{aligned}{} & {} KL(P(X \mid \theta ),P(X \mid \theta ^\prime ))\nonumber \\{} & {} \quad =\int {p(x)\log {\frac{p(x)}{q(x)}}}dx=\left\langle \log {\frac{p(x)}{q(x)}}\right\rangle _{p(x)}\nonumber \\{} & {} \quad =\left\langle \log {\frac{\left( \dfrac{\prod _{j=1}^{d} {x_j}^{\alpha _j-1} \prod _{k=1}^{K} {x_{d+k}}^{\alpha _{d+k}-\bar{A}_k}}{\prod _{k=0}^{K} Beta(A_k)}\right) }{\left( \dfrac{\prod _{j=1}^{d} {x_j}^{\beta _j-1} \prod _{k=1}^{K} {x_{d+k}}^{\beta _{d+k}-\bar{B}_k}}{\prod _{k=0}^{K} Beta(B_k)}\right) }}\right) \rangle _{p(x)} \end{aligned}$$

(C34)

By Expansion,

$$\begin{aligned}{} & {} KL(P(X \mid \theta ),P(X \mid \theta ^\prime ))\nonumber \\{} & {} \qquad =\Bigg [\sum _{j=1}^d\alpha _j\log {x_j}-\sum _{j=1}^d x_j+\sum _{k=1}^K\alpha _{d+k}\log {x_{d+k}}\nonumber \\{} & {} \quad -\sum _{k=1}^K\bar{A}_k\log {x_{d+k}}+\sum _{k=0}^K\log {\Gamma (\sum _{j\in k}\alpha _{j})}\nonumber \\{} & {} \quad -\sum _{k=0}^K\sum _{j\in k}\log {\Gamma (\alpha _j)} \Bigg ] \nonumber \\{} & {} \quad -\Bigg [\sum _{j=1}^d\beta _j\log {x_j}-\sum _{j=1}^d x_j+\sum _{k=1}^K\beta _{d+k}\log {x_{d+k}}\nonumber \\ \end{aligned}$$

$$\begin{aligned}{} & {} \quad -\sum _{k=1}^K\bar{B}_k\log {x_{d+k}}\nonumber \\{} & {} \quad +\sum _{k=0}^K\log {\Gamma (\sum _{j\in k}\beta _{j})}-\sum _{k=0}^K\sum _{j\in k}\log {\Gamma (\beta _j)} \Bigg ] \end{aligned}$$

(C35)

$$\begin{aligned} {\begin{matrix} KL(P(X \mid \theta ),P(X \mid \theta ^\prime ))&{}=\sum _{j=1}^d(\alpha _j-\beta _j)\langle \log {x_j}\rangle _{p(x)}\\ {} &{}+\sum _{k=1}^K(\alpha _{d+k}+\bar{B}_k-\beta _{d+k}-\bar{A}_k)\langle \log {x_{d+k}}\rangle _{p(x)}\\ {} &{}+\sum _{k=0}^K\left( \log {\Gamma (\sum _{j\in k}\alpha _{j})}-\log {\Gamma (\sum _{j\in k}\beta _{j})}\right) \\ {} &{}+\sum _{k=0}^K\sum _{j\in k}\left( \log {\Gamma (\beta _{j})}-\log {\Gamma (\alpha _{j})}\right) \end{matrix}} \end{aligned}$$

(C36)

where,

$$\begin{aligned} {\begin{matrix} &{}\langle \log {x_j}\rangle _{p(x)}=\Psi (\alpha _{j})-\Psi \left( \sum _{j=1}^d\alpha _j\right) \\ &{}\langle \log {x_{d+k}}\rangle _{p(x)}=\Psi (\alpha _{d+k})-\Psi \left( \sum _{k=1}^K\alpha _{d+k}\right) \end{matrix}} \end{aligned}$$

(C37)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Alkhawaja, F., Bouguila, N. Unsupervised nested Dirichlet finite mixture model for clustering. Appl Intell 53, 25232–25258 (2023). https://doi.org/10.1007/s10489-023-04888-8

Download citation

Accepted: 14 July 2023
Published: 07 August 2023
Issue Date: November 2023
DOI: https://doi.org/10.1007/s10489-023-04888-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Unsupervised nested Dirichlet finite mixture model for clustering

Abstract

Access this article

Similar content being viewed by others

Variational learning of hierarchical infinite generalized Dirichlet mixture models and applications

Variational inference and sparsity in high-dimensional deep Gaussian mixture models

Variational Learning of Finite Inverted Dirichlet Mixture Models and Applications

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Appendices

Appendix A: The connection between NDD and Dirichlet-tree distributions

1.1 A.1 Derivation of NDD

1.2 A.2 Derivation of Dirichlet-tree distribution

Appendix B: Dirichlet-tree Mixture Model

1.1 B.1 Derivation of DTMM

1.2 B.2 Results verification

Appendix C: Proof of exponential properties of NDD

1.1 C. 1 Proof of exponential pertinence

1.2 C.2 Bhattacharyya distance

1.3 Kullback-Leibler divergence

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Unsupervised nested Dirichlet finite mixture model for clustering

Abstract

Access this article

Similar content being viewed by others

Variational learning of hierarchical infinite generalized Dirichlet mixture models and applications

Variational inference and sparsity in high-dimensional deep Gaussian mixture models

Variational Learning of Finite Inverted Dirichlet Mixture Models and Applications

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Appendices

Appendix A: The connection between NDD and Dirichlet-tree distributions

1.1 A.1 Derivation of NDD

1.2 A.2 Derivation of Dirichlet-tree distribution

Appendix B: Dirichlet-tree Mixture Model

1.1 B.1 Derivation of DTMM

1.2 B.2 Results verification

Appendix C: Proof of exponential properties of NDD

1.1 C. 1 Proof of exponential pertinence

1.2 C.2 Bhattacharyya distance

1.3 Kullback-Leibler divergence

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation