Skip to main content
Log in

Unsupervised nested Dirichlet finite mixture model for clustering

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

The Dirichlet distribution is widely used in the context of mixture models. Despite its flexibility, it still suffers from some limitations, such as its restrictive covariance matrix and its direct proportionality between its mean and variance. In this work, a generalization over the Dirichlet distribution, namely the Nested Dirichlet distribution, is introduced in the context of finite mixture model providing more flexibility and overcoming the mentioned drawbacks, thanks to its hierarchical structure. The model learning is based on the generalized expectation-maximization algorithm, where parameters are initialized with the method of moments and estimated through the iterative Newton-Raphson method. Moreover, the minimum message length criterion is proposed to determine the best number of components that describe the data clusters by the finite mixture model. The Nested Dirichlet distribution is proven to be part of the exponential family, which offers several advantages, such as the calculation of several probabilistic distances in closed forms. The performance of the Nested Dirichlet mixture model is compared to the Dirichlet mixture model, the generalized Dirichlet mixture model, and the Convolutional Neural Network as a deep learning network. The excellence of the powerful proposed framework is validated through this comparison via challenging datasets. The hierarchical feature of the model is applied to real-world challenging tasks such as hierarchical cluster analysis and hierarchical feature learning, showing a significant improvement in terms of accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

References

  1. Uci machine learning repository: Optical recognition of handwritten digits data set. URL https://archive.ics.uci.edu/ml/datasets/optical%2Brecognition%2Bof%2Bhandwritten%2Bdigits

  2. Alalyan F, Zamzami N, Bouguila N (2019) Model-based hierarchical clustering for categorical data. In 2019 IEEE 28th International Symposium on Industrial Electronics (ISIE), pp 1424–1429. IEEE

  3. Alzubaidi Laith, Zhang Jinglan, Humaidi Amjad J, Al-Dujaili Ayad, Duan Ye, Al-Shamma Omran, Santamaría José, Fadhel Mohammed A, Al-Amidie Muthana, Farhan Laith (2021) Review of deep learning: concepts, cnn architectures, challenges, applications, future directions. J Big Data 8:1–74

    Article  Google Scholar 

  4. Azam M, Bouguila N (2016) Speaker classification via supervised hierarchical clustering using ica mixture model. In International Conference on Image and Signal Processing, pp 193–202. Springer

  5. Baxter RA (2010) Minimum Message Length, pp 668–674. Springer US, Boston, MA. ISBN 978-0-387-30164-8. https://doi.org/10.1007/978-0-387-30164-8_542

  6. Nizar B (2008) Clustering of count data using generalized dirichlet multinomial distributions. IEEE Trans Knowl Data Eng 20(4):462–474

    Article  Google Scholar 

  7. Nizar B (2012) Hybrid generative/discriminative approaches for proportional data modeling and classification. IEEE Trans Knowl Data Eng 24(12):2184–2202. https://doi.org/10.1109/TKDE.2011.162

    Article  Google Scholar 

  8. Nizar B (2012) Hybrid generative/discriminative approaches for proportional data modeling and classification. IEEE Trans Knowl Data Eng 24(12):2184–2202

    Article  Google Scholar 

  9. Nizar B (2013) Deriving kernels from generalized dirichlet mixture models and applications. Inform Process Manag 49(1):123–137

    Article  Google Scholar 

  10. Nizar B, Ola A (2009) A discrete mixture-based kernel for svms: Application to spam and image categorization. Inform Process Manag 45(6):631–642

    Article  Google Scholar 

  11. Bouguila N, Fan W (2020) Mixture models and applications. Springer

  12. Bouguila N, Ziou D (2005) Mml-based approach for high-dimensional unsupervised learning using the generalized dirichlet mixture. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05)-Workshops, pp 53–53. IEEE

  13. Nizar B, Djemel Z (2007) High-dimensional unsupervised selection and estimation of a finite generalized dirichlet mixture model based on minimum message length. IEEE Trans Pattern Anal Mach Intell 29(10):1716–1731

    Article  Google Scholar 

  14. Nizar B, Djemel Z, Jean V (2004) Unsupervised learning of a finite mixture model based on the dirichlet distribution and its application. IEEE Trans Image Process 13(11):1533–1543

    Article  Google Scholar 

  15. Bourouis S, Bouguila N (2023) Expectation propagation learning of finite and infinite gamma mixture models and its applications. Multimed Tool Appl, pp 1–18

  16. Sami B, Atef Z, Nizar B, Roobaea A (2018) Deriving probabilistic svm kernels from flexible statistical mixture models and its application to retinal images classification. IEEE Access 7:1107–1117

    Google Scholar 

  17. Bourouis S, Alharbi A, Bouguila N (2021a) Bayesian learning of shifted-scaled dirichlet mixture models and its application to early covid-19 detection in chest x-ray images. J Imag, 7(1). ISSN 2313-433X. https://doi.org/10.3390/jimaging7010007. https://www.mdpi.com/2313-433X/7/1/7

  18. Bourouis S, Alharbi A, Bouguila N (2021b) Bayesian learning of shifted-scaled dirichlet mixture models and its application to early covid-19 detection in chest x-ray images. J Imag, 7(1). ISSN 2313-433X. https://doi.org/10.3390/jimaging7010007. https://www.mdpi.com/2313-433X/7/1/7

  19. Campbell M, Galligan M, Saldova R, Rudd P, Murphy T (2011) Application of compositional models for glycan hilic data. 01

  20. Bob C (2010) Integrating out multinomial parameters in latent dirichlet allocation and naive bayes for collapsed gibbs sampling. Rapport Technique 4:464

    Google Scholar 

  21. Chen J, Gong Z, Liu W (2019) A nonparametric model for online topic discovery with word embeddings. Inform Sci, 504:32–47. ISSN 0020-0255. https://doi.org/10.1016/j.ins.2019.07.048. https://www.sciencedirect.com/science/article/pii/S0020025519306541

  22. Junyang C, Zhiguo G, Weiwen L (2020) A dirichlet process biterm-based mixture model for short text stream clustering. Appl Intell 50:1609–1619

    Article  Google Scholar 

  23. Connor RJ, Mosimann JE (1969) Concepts of independence for proportions with a generalization of the dirichlet distribution. J Amer Stat Assoc 64(325):194–206

    Article  MathSciNet  MATH  Google Scholar 

  24. Samuel Y, Dennis III (1991) On the hyper-dirichlet type 1 and hyper-liouville distributions. Communications in Statistics-Theory and Methods 20(12):4069–4081

    Article  MathSciNet  MATH  Google Scholar 

  25. Elise E, Nizar B (2016) Proportional data modeling with hidden markov models based on generalized dirichlet and beta-liouville mixtures applied to anomaly detection in public areas. Pattern Recogn 55:125–136

    Article  Google Scholar 

  26. Fisher RA. Iris dataset. https://archive-beta.ics.uci.edu/ml/datasets/iris

  27. Genovese CR, Larry W (2000) Rates of convergence for the Gaussian mixture sieve. Annal Stat 28(4):1105–1127

    Article  MathSciNet  MATH  Google Scholar 

  28. Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT press

  29. Graybill FA (1983) Matrices with applications in statistics. p 461

  30. Guo Z, Wang ZJ (2012) An unsupervised hierarchical feature learning framework for one-shot image recognition. IEEE Trans Multimed 15(3):621–632

    Article  Google Scholar 

  31. Hamdi M, Hilali-Jaghdam I, Elnaim BE, Elhag AA (2023) Forecasting and classification of new cases of covid 19 before vaccination using decision trees and gaussian mixture model. Alexandria Eng J, 62:327–333. ISSN 1110-0168. https://doi.org/10.1016/j.aej.2022.07.011. https://www.sciencedirect.com/science/article/pii/S111001682200463X

  32. Harris C, Stephens M et al (1988) A combined corner and edge detector. In Alvey vision conference, vol 15, pp 10–5244. Citeseer

  33. Ihou KE, Bouguila N (2017) A new latent generalized dirichlet allocation model for image classification. In 2017 Seventh International Conference on Image Processing Theory, Tools and Applications (IPTA), pp 1–6. IEEE

  34. Ishak S, Chowdary AJ (2021) Evaluating robustness of a cnn architecture introduced to the adversarial attacks

  35. Janosi A, Steinbrunn W, Pfisterer, Robert, Detrano MD (1988) Heart Disease. UCI Machine Learning Repository

  36. Tony J, Risi K, Andrew H (2004) Probability product kernels. J Mach Learn Res 5:819–844

    MathSciNet  MATH  Google Scholar 

  37. Jefferys WH, Berger JO (1992) Ockham’s razor and bayesian analysis. American Scientist 80(1):64–72

    Google Scholar 

  38. Qiang J, Yanfeng S, Junbin G, Hu Y, Baocai Y (2022) A decoder-free variational deep embedding for unsupervised clustering. IEEE Trans Neural Netw Learn Syst 33(10):5681–5693. https://doi.org/10.1109/TNNLS.2021.3071275

    Article  MathSciNet  Google Scholar 

  39. Zhuxi J, Yin Z, Huachun T, Bangsheng T, Zhou H (2017) Variational deep embedding: an unsupervised and generative approach to clustering,

  40. Kavitha R, Jothi DK, Saravanan K, Swain MP, Gonzáles JLA, Bhardwaj RJ, Adomako E et al (2023) Ant colony optimization-enabled cnn deep learning technique for accurate detection of cervical cancer. BioMed Res Int, 2023

  41. Kondor R, Jebara T (2003) A kernel between sets of vectors. In Proceedings of the 20th international conference on machine learning (ICML-03), pp 361–368

  42. Koslovsky MD, Vannucci M (2020) Microbvs: Dirichlet-tree multinomial regression models with bayesian variable selection-an r package. BMC Bioinformatics 21:1–10

    Google Scholar 

  43. Krizhevsky A, Nair V, Hinton G. Cifar-10 (canadian institute for advanced research). http://www.cs.toronto.edu/ kriz/cifar.html

  44. Kullback S, Leibler RA (1951) On information and sufficiency. The Annal Math Stat 22(1):79–86

    Article  MathSciNet  MATH  Google Scholar 

  45. Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR’06), vol 2, pp 2169–2178. IEEE

  46. Liu S, Huang D, Wang Y (2019) Adaptive nms: Refining pedestrian detection in a crowd. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6459–6468

  47. Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110

    Article  Google Scholar 

  48. Shanxiang L, Zheng W, Cong L, Hao C (2022) Better lattice quantizers constructed from complex integers. IEEE Trans Commun 70(12):7932–7940

    Article  Google Scholar 

  49. Mangaroska K, Martinez-Maldonado R, Vesin B, Gašević D (2021) Challenges and opportunities of multimodal data in human learning: The computer science students’ perspective. J Comput Assisted Learn, 37(4):1030–1047. https://doi.org/10.1111/jcal.12542

  50. Narges M, Nizar B (2019) A probabilistic approach based on a finite mixture model of multivariate beta distributions. ICEIS 1:373–380

    Google Scholar 

  51. Masoudnia S, Mersa O, Araabi BN, Vahabie A-H, Sadeghi MA, Ahmadabadi MN (2019) Multi-representational learning for offline signature verification using multi-loss snapshot ensemble of cnns. Expert Syst Appl, 133:317–330. ISSN 0957-4174. https://doi.org/10.1016/j.eswa.2019.03.040. https://www.sciencedirect.com/science/article/pii/S0957417419301666

  52. McLachlan GJ, Lee SX, Rathnayake SI (2019) Finite mixture models. Annual Rev Stat Appl 6(1):355–378

    Article  MathSciNet  Google Scholar 

  53. McLachlan GJ, Lee SX, Rathnayake SI (2019) Finite mixture models. Annual Rev Stat Appl 6(1):355–378. https://doi.org/10.1146/annurev-statistics-031017-100325

    Article  MathSciNet  Google Scholar 

  54. McLachlan GJ, Lee SX, Rathnayake SI (2019) Finite mixture models. Annual Rev Stat Appl 6(1):355–378. https://doi.org/10.1146/annurev-statistics-031017-100325

    Article  MathSciNet  Google Scholar 

  55. Minka TP (2000) Estimating a dirichlet distribution. https://tminka.github.io/papers/dirichlet/minka-dirichlet.pdf

  56. Minka T (1999) The dirichlet-tree distribution. https://www.microsoft.com/en-us/research/publication/dirichlet-tree-distribution/

  57. Monti GS, Mateu-Figueras G, Pawlowsky-Glahn V (2011) Notes on the Scaled Dirichlet Distribution, chapter 10, pp 128–138. John Wiley & Sons, Ltd. ISBN 9781119976462. https://doi.org/10.1002/9781119976462.ch10

  58. Saeed M, Tarek Z, Fuzhan N, Farzaneh G (2020) Automated anomaly detection and localization in sewer inspection videos using proportional data modeling and deep learning-based text recognition. J Infrastructure Syst 26(3):04020018. https://doi.org/10.1061/(ASCE)IS.1943-555X.0000553

    Article  Google Scholar 

  59. Joakim M, Lars M, Jesper R (2017) Polynomial probability distribution estimation using the method of moments. PloS one 12(4):e0174573

    Article  Google Scholar 

  60. Najar F, Bouguila N (2020) Image categorization using agglomerative clustering based smoothed dirichlet mixtures. In International Symposium on Visual Computing, pp 27–38. Springer

  61. Najar F, Bouguila N (2022) Emotion recognition: A smoothed dirichlet multinomial solution. Eng Appl Artif Intell, 107:104542. ISSN 0952-1976. https://doi.org/10.1016/j.engappai.2021.104542. https://www.sciencedirect.com/science/article/pii/S0952197621003900

  62. Rim N, Manar A, Nizar B (2020) A novel approach for modeling positive vectors with inverted dirichlet-based hidden markov models. Knowledge-Based Syst 192:105335

    Article  Google Scholar 

  63. Null B (2008) The nested dirichlet distribution: properties and applications. 11

  64. Null B (2009) Modeling baseball player ability with a nested dirichlet distribution. J Quantitative Anal Sports 5:5–5

    MathSciNet  Google Scholar 

  65. Oboh BS, Bouguila N (2017) Unsupervised learning of finite mixtures using scaled dirichlet distribution and its application to software modules categorization. In 2017 IEEE international conference on industrial technology (ICIT), pp 1085–1090. IEEE

  66. Ombabi AH, Ouarda W, Alimi AM (2020) Deep learning cnn-lstm framework for arabic sentiment analysis using textual information shared in social networks. Social Netw Anal Mining 10:1–13

    Google Scholar 

  67. Palaz D, Magimai-Doss M, Collobert R (2019) End-to-end acoustic modeling using convolutional neural networks for hmm-based automatic speech recognition. Speech Commun, 108:15–32. ISSN 0167-6393. https://doi.org/10.1016/j.specom.2019.01.004. https://www.sciencedirect.com/science/article/pii/S0167639316301625

  68. Patrício M, Caramelo F, Seiça R, Matafome P, Crisóstomo J, Pereira J. Breast cancer coimbra data set. https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Coimbra

  69. Cinelli LP, Marins MA, Barros da Silva EA, Netto SL (2021) Variational Autoencoder, pp 111–149. Springer International Publishing, Cham

  70. Rahman MH, Bouguila N (2021) Efficient feature mapping in classifying proportional data. IEEE Access 9:3712–3724. https://doi.org/10.1109/ACCESS.2020.3047536

    Article  Google Scholar 

  71. Jorma R (1978) Modeling by shortest data description. Automatica 14(5):465–471

    Article  Google Scholar 

  72. Roberts SJ, Dirk H, Iead R, William P (1998) Bayesian approaches to gaussian mixture modeling. IEEE Trans Pattern Anal Mach Intell 20:1133–1142

    Article  Google Scholar 

  73. Gerd R (1989) Maximum likelihood estimation of dirichlet distributions. J Stat Comput Simul 32(4):215–221

  74. Sakamoto Y, Ishiguro M, Kitagawa G (1986) Akaike information criterion statistics. Dordrecht, The Netherlands: D. Reidel, 81(10.5555):26853

  75. Mahsa S, Ralf W (2021) Investment decisions with endogeneity: A dirichlet tree analysis. J Risk Financial Manag 14(7):299

    Article  Google Scholar 

  76. Sharma P, Berwal YPS, Ghai W (2020) Performance analysis of deep learning cnn models for disease detection in plants using image segmentation. Inform Process Agriculture, 7(4):566–574. ISSN 2214-3173. https://doi.org/10.1016/j.inpa.2019.11.001. https://www.sciencedirect.com/science/article/pii/S2214317319301957

  77. Singh JP, Bouguila N (2018) Intrusion detection using unsupervised approach. In Emerging Technologies for Developing Countries: First International EAI Conference, AFRICATEK 2017, Marrakech, Morocco, March 27-28, 2017 Proceedings 1st, pp 192–201. Springer

  78. Singhal A, Singh P, Lall B, Joshi SD (2020) Modeling and prediction of covid-19 pandemic using gaussian mixture model. Chaos, Solitons Fractals 138:110023

    Article  MathSciNet  Google Scholar 

  79. Wallace CS (2005) Statistical and inductive inference by minimum message length. Springer

  80. Wang T, Zhao H (2021) Statistical methods for analyzing tree-structured microbiome data. Statistical Analysis of Microbiome Data, pp 193–220

  81. Weakliem DL (1999) A critique of the bayesian information criterion for model selection. Sociological Method Res 27(3):359–397

    Article  Google Scholar 

  82. Yang L, Fan W, Bouguila N (2021) Deep clustering analysis via dual variational autoencoder with spherical latent embeddings. IEEE Transactions on Neural Networks and Learning Systems, pp 1–10. https://doi.org/10.1109/TNNLS.2021.3135460

  83. Lin Y, Wentao F, Nizar B (2022) Clustering analysis via deep generative models with mixture models. IEEE Trans Neural Netw Learn Syst 33(1):340–350. https://doi.org/10.1109/TNNLS.2020.3027761

    Article  MathSciNet  Google Scholar 

  84. Yin H (2010) Scene classification using spatial pyramid matching and hierarchical dirichlet processes

  85. Yin J, Chao D, Liu Z, Zhang W, Yu X, Wang J (2018) Model-based clustering of short text streams. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ’18, pp 2634–2642, New York, NY, USA. Association for Computing Machinery. ISBN 9781450355520. https://doi.org/10.1145/3219819.3220094

  86. Nuha Z, Nizar B (2019) Model selection and application to high-dimensional count data clustering. Appl Intell 49(4):1467–1488

    Article  Google Scholar 

  87. Zamzami N, Bouguila N (2019b) A novel scaled dirichlet-based statistical framework for count data modeling: Unsupervised learning and exponential approximation. Pattern Recogn, 95:36–47. ISSN 0031-3203. https://doi.org/10.1016/j.patcog.2019.05.038. https://www.sciencedirect.com/science/article/pii/S0031320319302237

  88. Nuha Z, Nizar B (2019) Hybrid generative discriminative approaches based on multinomial scaled dirichlet mixture models. Appl Intell 49(11):3783–3800. https://doi.org/10.1007/s10489-019-01437-0

    Article  Google Scholar 

  89. Zamzami N, Bouguila N (2019d) An accurate evaluation of msd log-likelihood and its application in human action recognition. In 2019 IEEE Global Conference on Signal and Information Processing (GlobalSIP), pp 1–5. https://doi.org/10.1109/GlobalSIP45357.2019.8969324

  90. Nuha Z, Nizar B (2020) High-dimensional count data clustering based on an exponential approximation to the multinomial beta-liouville distribution. Inform Sci 524:116–135

    Article  MathSciNet  MATH  Google Scholar 

  91. Ziou D, Bouguila N (2004) Unsupervised learning of a finite gamma mixture using MML: application to SAR image analysis. In 17th International Conference on Pattern Recognition, ICPR 2004, Cambridge, UK, pp 68–71. IEEE Computer Society

Download references

Funding

No funding was received to assist with the preparation of this manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fares Alkhawaja.

Ethics declarations

Conflicts of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: The connection between NDD and Dirichlet-tree distributions

The derivation of the PDF of both the NDD [63] and Dirichlet-tree distributions [56] is based on the distribution introduced by [24]. Both distributions are solely based on the Dirichlet distribution being represented as a finite stochastic process with more generalization to the covariance.

1.1 A.1 Derivation of NDD

The NDD can be viewed as a representation of the original variables in the Dirichlet distribution in addition to the nesting variables where each nesting follows a Dirichlet distribution as well. Therefore, the NDD can be directly derived from the PDF of the Dirichlet distribution. Thus, (\(x_{(d+1)},x_{(d+2)},\ldots ,x_{(d+k)}\)) is determined by (\(x_1,x_2,\ldots ,x_d\)).

Consequently,

$$\begin{aligned} {\begin{matrix} &{}P(x_1,x_2,\ldots ,x_{(d+K)})=\\ {} &{}P(x_1,x_2,\ldots ,x_d\ ) P(x_{(d+1)},x_{(d+2)},\ldots ,x_{(d+K)}\mid x_1,x_2,\ldots ,x_d)=\\ {} &{}P(x_1,x_2,\ldots ,x_d) \end{matrix}} \end{aligned}$$
(A1)

By denoting \(X_0\sim \mathcal {D}(\alpha _{0})\) for the unnested variables (root node variables) and \(X_k \quad \forall k>0\), where \(X_k=\tilde{X}_kx_n+k\) given that \(\tilde{X}_k\sim \mathcal {D}(\alpha _{k})\), the following expression can be written:

$$\begin{aligned} P(x_1,x_2,\ldots ,x_d)= & {} P(X_0,X_1,\ldots ,X_K)\nonumber \\= & {} P(X_0)P(X_1,\ldots ,X_K\mid X_0)\nonumber \\{} & {} \textit{Using chain-rule}\nonumber \\= & {} P(X_0)f(X_1\mid X_0) P(X_2\mid X_0,X_1)\nonumber \\{} & {} \dots P(X_K\mid X_0,X_1,\dots X_{K-1})\nonumber \\{} & {} \textit{Using above notations}\nonumber \\= & {} P(X_0)P(X_1\mid x_{d+1})P(X_2\mid x_{d+2})\nonumber \\{} & {} \dots P(X_K \mid x_{d+K}) \end{aligned}$$
(A2)

Due to the change in the variable,

$$\begin{aligned} {\begin{matrix} P(X_k\mid x_{d+k})&{}=P(\tilde{X}_kx_n+k\mid x_n+k)|\textbf{J}_{X_k \longrightarrow \tilde{X}_k}|\\ {} &{}=P(\tilde{X}_k)|\textbf{J}_{X_k \longrightarrow \tilde{X}_k}|\end{matrix}} \end{aligned}$$
(A3)

where \(|\textbf{J}_{X_k \longrightarrow \tilde{X}_k}|\) is the determinant of Jacobian matrix derived in [23]. The Jacobian matrix is added as a result of the change of variable rule. According to [23],

$$\begin{aligned} |\textbf{J}_{X_k \longrightarrow \tilde{X}_k}|=\begin{bmatrix} \frac{1}{x_{d+k}} &{}0 &{} \cdots &{} 0 \\ 0 &{}\frac{1}{x_{d+k}} &{} \cdots &{} 0 \\ \vdots &{} \vdots &{}\ddots &{} \vdots \\ 0 &{}0 &{} \cdots &{} \frac{1}{x_{d+k}} \end{bmatrix}=\left( \frac{1}{x_{d+k}}\right) ^{|X_j|-1} \end{aligned}$$
(A4)

Therefore,

$$\begin{aligned} P(X_k\mid x_{n+k})=P(\tilde{X}_k)\left( \frac{1}{x_{d+k}}\right) ^{|X_j|-1} \end{aligned}$$
(A5)
$$\begin{aligned} {\begin{matrix} &{}P(x_1,x_2,\ldots ,x_n)=P(X_0)\prod _{k=1}^KP(\tilde{X}_k)\left( \frac{1}{x_{d+k}}\right) ^{|X_k|-1}\\ &{}=\frac{1}{B\left( A_0\right) }\left( \prod _{j \in I_0} x_j^{\alpha _j-1}\right) \left( \prod _{k=1}^K \frac{1}{B\left( A_k\right) }\left( \prod _{j \in I_k}\left( \frac{x_j}{x_{d+k}}\right) ^{\alpha _j-1}\right) {\frac{1}{x_{d+k}}}^{|x_k|-1}\right) \\ &{}=\left( \prod _{k=0}^K \frac{1}{B\left( A_k\right) }\right) \left( \prod _{j=1}^{d+K}\left( x_j^{\alpha _j-1}\right) \right) \left( \prod _{k=1}^K \frac{1}{x_{d+k}} ^{\sum _{j \in I_k}{\alpha _j-1}}\right) \\ &{}=\left( \prod _{k=0}^K \frac{1}{B\left( A_k\right) }\right) \left( \prod _{j=1}^{d+K}\left( x_j^{\alpha _j-1}\right) \right) \left( \prod _{k=1}^K \frac{1}{x_{d+k}} ^{\bar{A}_k-1}\right) \\ &{}=\left( \prod _{k=0}^K \frac{1}{B\left( A_k\right) }\right) \left( \prod _{j=1}^{d+K}\left( x_j^{\alpha _j-1}\right) \right) \left( \prod _{k=1}^K x_{d+k}^{1-\bar{A}_k}\right) \\ &{}=\left( \prod _{k=0}^K \frac{1}{B\left( A_k\right) }\right) \left( \prod _{j=1}^d\left( x_j^{\alpha _j-1}\right) \right) \left( \prod _{k=1}^K x_{d+k} ^{\alpha _{d+k}-\bar{A}_k}\right) \\ &{}=\frac{\left( \prod _{j=1}^d x_j{ }^{\alpha _j-1}\right) \left( \prod _{k=1}^K x_{d+k}{ }^{\alpha _{d+k}-\bar{A}_k}\right) }{\prod _{k=0}^K B\left( A_k\right) } \end{matrix}} \end{aligned}$$
(A6)

1.2 A.2 Derivation of Dirichlet-tree distribution

In [56], the function (\(\delta \)) is introduced to control the contribution of each branch to the associated nodes as illustrated in this section.

The probability for a certain sample (x) for a set of leaf probabilities \((p_1\dots p_d)\) can be formulated as:

$$\begin{aligned} P(x\mid p_1\dots p_d)=\prod _{j=1}^d{p_j}^{\delta (x-j)} \end{aligned}$$
(A7)

where \(P(p_1\dots p_d\mid \alpha ) \sim \mathcal {D}(\alpha _{j})\).

Analogously, the leaf probabilities in the Dirichlet-tree distribution for a given tree T for K nodes and B branch weights and C branches can be represented as

$$\begin{aligned} P(x\mid B,T)=\prod _{k=1}^K\prod _{c=1}^Cb^{\delta _{kc}(x)}_{kc} \end{aligned}$$
(A8)

where,

$$\begin{aligned} b_{kc}=\frac{\sum _{j=1}^d\delta _{kc}p_j}{\sum _{j=1}^d\delta _{k}p_j} \end{aligned}$$
(A9)

and,

$$\begin{aligned} \delta _{kc}={\left\{ \begin{array}{ll} 1,&{} \text {if branch { kc} leads to { x}}\\ 0,&{} \text {otherwise} \end{array}\right. } \end{aligned}$$
(A10)

Similar to the NDD, the interior nodes are defining another Dirichlet distribution for the descendant branches as shown in (A5).

$$\begin{aligned} P(B\mid \alpha )=\prod _{k=1}^KP(b_j\mid \alpha ) \end{aligned}$$
(A11)

where \(P(b_j\mid \alpha )\sim \mathcal {D}(\alpha _{kc})\).

By combining (A2) with (A5) and multiplying by the Jacobian introduced by [23] in [24] the PDF of the Dirichlet-tree distribution can be written as:

$$\begin{aligned} P(x_1\dots x_d \mid \alpha ,T)=\prod _{j=1}^dx^{\alpha _{parent{(j)}}-1}_j\prod _{k=1}^K\frac{\Gamma (\sum _{c=1}^C\alpha _{kc})}{\prod _{c=1}^C{\Gamma (\alpha _{kc})}}\left( \sum _{j=1}^d\sum _{c=1}^C\delta _{kc(j)}x_j\right) ^{\beta _{k}} \end{aligned}$$
(A12)

where,

$$\begin{aligned} \beta _{k}={\left\{ \begin{array}{ll} 0,&{} \text {if { k} is the root node}\\ \alpha _{parent(k)-\sum _{c=1}^C\alpha _{kc}},&{} \text {otherwise} \end{array}\right. } \end{aligned}$$
(A13)
Fig. 10
figure 10

Tree structures of the NDD and Dirichlet-tree distribution

In (A7), the consideration of the tree is given from the original variables upwards, resulting in the consideration of all the variables in one equation. Therefore, instead of dealing with each nesting independently as shown in the NDD derivation, (\(\delta _{kc}\)) determines which iteration is accepted or removed depending on (\(\varvec{\delta }\)); a 3D matrix that considers nodes, branches, and original variables. Therefore, (\(\varvec{\delta }\)) needs to be defined for all the original variables in advance. Moreover, any change to the tree structure would need to modify the (\(\varvec{\delta }\)) matrix.

Appendix B: Dirichlet-tree Mixture Model

1.1 B.1 Derivation of DTMM

In order to formulate the Dirichlet-tree distribution PDF, the tree shown in Fig. 1 needs to be redefined as in Fig. 14, to show the similarity between the NDD and the Dirichlet-tree distribution. The PDF shown in (B14) is the same as the one used in (A12), where \(\bar{\alpha }_j=\alpha _{parent(j)}\).

$$\begin{aligned} P(X\mid \alpha )=\prod _{j=1}^dx^{\bar{\alpha }_j-1}_j\prod _{k=1}^K\dfrac{\Gamma (\sum _{c}^C\alpha _{kc})}{\prod _{c=1}^C\Gamma (\alpha _{kc})}\left( \sum _{j=1}^d\sum _{c=1}^C\delta _{kc}(j)x_j\right) ^{\beta _{k}} \end{aligned}$$
(B14)

where (\(\delta _{kc}\)) and (\(\beta _{k}\)) are defined in Eqs. A10 and A13, respectively. Accordingly,

$$\begin{aligned} P(X\mid \theta )=\sum _{m=1}^{M}P(X\mid \vec {\alpha }_m)P(m), \end{aligned}$$
(B15)

where,

$$\begin{aligned} \theta =\{\vec {\alpha }_1\ldots ,\vec {\alpha }_M,P(1),\ldots ,P(M)\}, \end{aligned}$$

and,

$$\begin{aligned} \vec {\alpha }_m=(\alpha _{kcm},\ldots , \alpha _{KCm}). \end{aligned}$$

Thus, for (N) independent observations \(\mathcal {X}=\{X_i,\ldots ,X_N\}\)

$$\begin{aligned} P(\mathcal {X}\mid \theta )=\prod _{i=1}^{N} {\sum _{m=1}^{M}P(X_i\mid \vec {\alpha }_m)P(m)}. \end{aligned}$$
(B16)

Accordingly, the likelihood shown in (B17) can be formulated for the two conditions shown in (A13).

$$\begin{aligned} \mathcal {L}(\mathcal {X},\mathcal {Z}\mid \theta )=\sum _{i=1}^{N} \sum _{m=1}^{M} z_{im}\log {[P(X_i\mid \vec {\alpha }_m)P(m)]}. \end{aligned}$$
(B17)

Based on Fig. 14, for root nodes \((k=1)\):

$$\begin{aligned} \mathcal {L}(\mathcal {X},\mathcal {Z}\mid \theta )= & {} \sum _{i=1}^{N}\sum _{m=1}^{M}z_{im}\Bigg [\sum _{j=1}^d\Bigg [\bar{\alpha }_{j} \log {x_j}-\log {x_j} \nonumber \\{} & {} +\sum _{k=1}^K\Bigg [\log {\Gamma (\sum _{c=1}^C\alpha _{kc})}-\sum _{c=1}^C\log {\Gamma (\alpha _{kc})}\Bigg ]\Bigg ]\nonumber \\{} & {} +\log {P(m)}\Bigg ] \end{aligned}$$
(B18)

For non-root nodes \((k=[2,3])\):

$$\begin{aligned} \mathcal {L}(\mathcal {X},\mathcal {Z}\mid \theta )= & {} \sum _{i=1}^N\sum _{m=1}^Mz_{im}\Bigg [\sum _{j=1}^d\Bigg [ \bar{\alpha }_{j} \log {x_j}-\log {x_j} \nonumber \\{} & {} +\sum _{k=1}^K\Bigg [\log {\Gamma (\sum _{c=1}^C\alpha _{kc})}-\sum _{c=1}^C\log {\Gamma (\alpha _{kc})}\nonumber \\{} & {} +\bar{\alpha }_k\log {\Bigg (\sum _{j=1}^d\sum _{c=1}^C\delta _{kc}(j)x_j\Bigg )} \\{} & {} -\sum _{c=1}^C\alpha _{kc}\log {\Bigg (\sum _{j=1}^d\sum _{c=1}^C\delta _{kc}(j)x_j\Bigg )}\Bigg ]\Bigg ]\nonumber \\{} & {} +\log {P(m)}\Bigg ] \nonumber \end{aligned}$$
(B19)

From the above equations, the relation of the (\(\alpha _{kc}\)) and (\(\bar{\alpha }_k\)) can be observed, as \(\alpha _{kc} = \bar{\alpha }_k\) at specific (k). This is determined by the tree structure, hence, the use of (\(\delta _{kc}\)) is important to define this relationship. For example, according to Fig. 14, \(\alpha _{11} = \bar{\alpha }_2\). This makes each tree structure distinct from the other structures. Therefore, each structure requires its own model derivation. In this paper, the derivations are made with respect to the tree shown in Fig. 14. To estimate the parameters for the EM method, the Newton-Raphson method in (26) is used for the distribution parameters, while (25) is used to define the prior parameters. The parameter estimation for non-root nodes \((k=[2,3],c=[1,2])\) is required for (\(\alpha _{kc}\)) and (\(\bar{\alpha }_k\)) while using deterministic annealing extension:

$$\begin{aligned} \frac{\partial \Theta (\theta ^*,\theta ,T)}{\partial \alpha _{kcm}^{(t)}}= & {} \sum _{i=1}^Nz_{im}\Bigg [\log {x_{kc}}+\Psi (\sum _{c=1}^C\alpha _{kcm})\nonumber \\{} & {} -\Psi (\alpha _{kcm})\Bigg ] \end{aligned}$$
(B20)
$$\begin{aligned} \frac{\partial \Theta (\theta ^*,\theta ,T)}{\partial \bar{\alpha }_{km}^{(t)}}=\sum _{i=1}^Nz_{im}\left( \sum _{j=1}^d\sum _{c=1}^C\delta _{kc}(j)x_j\right) \end{aligned}$$
(B21)

while, the parameter estimation for non-root nodes \((k=1,c=[1,2])\) is only required for \(\alpha _{kc}\) as there are no parent nodes.

$$\begin{aligned} \frac{\partial \Theta (\theta ^*,\theta ,T)}{\partial \alpha _{kcm}^{(t)}}=\sum _{i=1}^Nz_{im}\left[ \Psi (\sum _{c=1}^C\alpha _{kcm})-\Psi (\alpha _{kcm})\right] . \end{aligned}$$
(B22)

However, the parent node for k=[2,3] is the actual node for k=1. Therefore, (B21) and (B22) can be combined for \(\alpha _{kc}: \forall k \in 1\). Thus:

$$\begin{aligned} \frac{\partial \Theta (\theta ^*,\theta ,T)}{\partial \alpha _{kcm}^{(t)}}:\forall k \in [2,3]= & {} \sum _{i=1}^Nz_{im}\Bigg [\log {x_{kc}}+\Psi (\sum _{c=1}^C\alpha _{kcm})\nonumber \\{} & {} -\Psi (\alpha _{kcm})\Bigg ] \end{aligned}$$
(B23)
$$\begin{aligned} \frac{\partial \Theta (\theta ^*,\theta ,T)}{\partial \alpha _{kcm}^{(t)}}:\forall k \in [1]= & {} \sum _{i=1}^Nz_{im}\Bigg [\Bigg (\sum _{j=1}^d\sum _{c=1}^C\delta _{kc}(j)x_j\Bigg ) \nonumber \\{} & {} +\Psi (\sum _{c=1}^C\alpha _{kcm})-\Psi (\alpha _{kcm})\Bigg ] \end{aligned}$$
(B24)

To define the Hessian matrix,

$$\begin{aligned} \frac{\partial ^2 \Theta (\theta ^*,\theta ,T)}{\partial \alpha _{kc_1m}^{(t)} \alpha _{kc_2m}^{(t)}} = \sum _{i=1}^Nz_{im}\left[ \Psi ^\prime (\sum _{c=1}^C\alpha _{kcm})\right] \end{aligned}$$
(B25)
$$\begin{aligned} \frac{\partial ^2 \Theta (\theta ^*,\theta ,T)}{\partial {\alpha _{kc_m}^{(t)}}^2} = \sum _{i=1}^Nz_{im}\left[ \Psi ^\prime (\sum _{c=1}^C\alpha _{kcm})-\Psi ^\prime (\alpha _{kcm})\right] \end{aligned}$$
(B26)

Finally, \(\delta _{kc}(j)\) is a (\(K\times C\times d\)) matrix, that is defined based on the tree structure. Therefore, based on Fig. 14, \(\delta _{kc}(j)\) is a (\(3\times 2\times 4\)) matrix, and can be written as:

$$\begin{aligned} \delta (1)=\begin{bmatrix} 1 &{} 0 \\ 1 &{} 0 \\ 0 &{} 0 \end{bmatrix} , \delta (2)=\begin{bmatrix} 1 &{} 0 \\ 0 &{} 1 \\ 0 &{} 0 \end{bmatrix} , \delta (3)=\begin{bmatrix} 0 &{} 1 \\ 0 &{} 0 \\ 1 &{} 0 \end{bmatrix} , \delta (4)=\begin{bmatrix} 0 &{} 1 \\ 0 &{} 0 \\ 0 &{} 1 \end{bmatrix} \end{aligned}$$
(B27)

1.2 B.2 Results verification

This section shows the results for the NDMM and DTMM compared to the DMM and the original values under the same initialization hyperparameters. In Table 13, it is shown that NDMM and DTMM gave the same results, which proves that they are similar to each other.

Table 12 Comparison between the yielded \(\alpha \) parameters for the DMM, NDMM and DTMM

Appendix C: Proof of exponential properties of NDD

In this section, the pertinence of the NDD to the exponential family, along with its properties to use the Bhattacharyya kernel and Kullback-Leibler divergence.

1.1 C. 1 Proof of exponential pertinence

First, the exponential of the natural logarithm is taken for the PDF of the NDD as shown in (C28).

$$\begin{aligned} \begin{aligned} P(X\mid \theta )&=\exp {[\log P(X\mid \theta )]} \\&=\frac{\prod _{j=1}^d x_j^{\alpha _j-1} \prod _{k=1}^k x_{d+k}^{\alpha _{d+k}-\bar{A}_k}}{\prod _{k=0}^k \frac{\prod _{j \in k} \Gamma \left( \alpha _j\right) }{\Gamma \left( \sum _{j \in k_k}^{\alpha _j}\right) }} \end{aligned} \end{aligned}$$
(C28)

To verify that the distribution belongs to the exponential family, it needs to be written in the form shown in (C29):

$$\begin{aligned} P(X\mid \theta )=\exp {[A(X)+\theta ^TT(X)-k(\theta )]} \end{aligned}$$
(C29)

where A is the measure function, T is the sufficient statistics, and K is the cumulant generating function [36]. Therefore, by expanding (C28),

$$\begin{aligned} P(X \mid \theta )= & {} \exp \Bigg [\sum _{j=1}^d\alpha _j\log {x_j}-\sum _{j=1}^d x_j+\sum _{k=1}^K\alpha _{d+k}\log {x_{d+k}}\nonumber \\{} & {} -\sum _{k=1}^K\bar{A}_k\log {x_{d+k}}+\sum _{k=0}^K\log {\Gamma (\sum _{j\in k}\alpha _{j})}\nonumber \\{} & {} -\sum _{k=0}^K\sum _{j\in k}\log {\Gamma (\alpha _j)} \Bigg ]\nonumber \\= & {} EXP\Bigg [\sum _{j=1}^d\alpha _j\log {x_j}-\sum _{j=1}^d x_j+\sum _{k=1}^K\alpha _{d+k}\log {x_{d+k}}\nonumber \\{} & {} -\sum _{k=1}^K\sum _{j\in k}\alpha _{d+j}\log {x_{d+k}}+\sum _{k=0}^K\log {\Gamma (\sum _{j\in k}\alpha _{j})}\nonumber \\{} & {} -\sum _{k=0}^K\sum _{j\in k}\log {\Gamma (\alpha _j)} \Bigg ] \end{aligned}$$
(C30)

Finally, (C30) can be written as:

$$\begin{aligned}{} & {} A(X)=-\sum _{j=1}^dx_j\nonumber \\{} & {} K(\theta )=\sum _{k=0}^K\log {\Gamma (\sum _{j\in k}\alpha _{j})}-\sum _{k=0}^K\sum _{j\in k}\log {\Gamma (\alpha _j)}\nonumber \\{} & {} \theta =(\alpha _{1},\dots ,\alpha _{d},\alpha _{d+1},\dots , \alpha _{d+K},\bar{A}_1,\dots ,\bar{A}_K)\nonumber \\{} & {} \theta =(\alpha _{1},\dots ,\alpha _{d},\alpha _{d+1},\dots , \alpha _{d+K},\sum _{j\in k(1)}\alpha _{j}),\dots ,\sum _{j\in k(K)}\alpha _{j})\nonumber \\{} & {} T(X)=(\log {x_1},\dots ,\log {x_d},\log {x_{d+1}},\dots ,\log {x{d+K}},\nonumber \\{} & {} \log {x_{d+1}},\dots ,\log {x{d+K}}) \end{aligned}$$
(C31)

1.2 C.2 Bhattacharyya distance

$$\begin{aligned}{} & {} BH(P(X \mid \theta ),P(X \mid \theta ^\prime ))\nonumber \\{} & {} \quad =\exp {\left[ K\left( \dfrac{1}{2}\theta -\dfrac{1}{2}\theta ^\prime \right) -\dfrac{1}{2}K(\theta )-\dfrac{1}{2}K(\theta ^\prime )\right] } \end{aligned}$$
(C32)
$$\begin{aligned}{} & {} BH(P(X \mid \theta ),P(X \mid \theta ^\prime ))\nonumber \\= & {} \exp \left[ \right. \sum _{k=0}^K\log {\Gamma \left( \right. \sum _{j\in k}\frac{\alpha _j}{2}-\frac{{\alpha _j}^\prime }{2}}\left. \right) \left. \right] \nonumber \\{} & {} -\sum _{k=0}^K\sum _{j\in k}\log {\Gamma \left( \frac{\alpha _j}{2}-\frac{{\alpha _j}^\prime }{2}\right) }-\frac{1}{2}\sum _{k=0}^K\log {\Gamma (\sum _{j\in k}\alpha _j)}\nonumber \\{} & {} +\frac{1}{2}\sum _{k=0}^K\sum _{j\in k}\log {\Gamma (\alpha _j)}\nonumber \\{} & {} -\frac{1}{2}\sum _{k=0}^K\log {\Gamma (\sum _{j\in k}{\alpha _j}^\prime )}+\frac{1}{2}\sum _{k=0}^K\sum _{j\in k}\log {\Gamma ({\alpha _j}^\prime )}\nonumber \\= & {} \frac{\prod _{k=0}^K \Gamma \left( \sum _{j \in k} \frac{\alpha _j-{\alpha _j}^{\prime }}{2}\right) \sqrt{\prod _{k=0}^K \prod _{j \in k} \Gamma \left( \alpha _j\right) \Gamma \left( {\alpha _j}^{\prime }\right) }}{\prod _{k=0}^K\sum _{j\in k} \Gamma \left( \frac{\alpha _j-{\alpha _j}^{\prime }}{2}\right) \sqrt{\prod _{k=0}^K \Gamma \left( \sum _{j \in k}{\alpha _j}\right) \Gamma \left( \sum _{j \in k} {\alpha _j}^{\prime }\right) }} \end{aligned}$$
(C33)

1.3 Kullback-Leibler divergence

$$\begin{aligned}{} & {} KL(P(X \mid \theta ),P(X \mid \theta ^\prime ))\nonumber \\{} & {} \quad =\int {p(x)\log {\frac{p(x)}{q(x)}}}dx=\left\langle \log {\frac{p(x)}{q(x)}}\right\rangle _{p(x)}\nonumber \\{} & {} \quad =\left\langle \log {\frac{\left( \dfrac{\prod _{j=1}^{d} {x_j}^{\alpha _j-1} \prod _{k=1}^{K} {x_{d+k}}^{\alpha _{d+k}-\bar{A}_k}}{\prod _{k=0}^{K} Beta(A_k)}\right) }{\left( \dfrac{\prod _{j=1}^{d} {x_j}^{\beta _j-1} \prod _{k=1}^{K} {x_{d+k}}^{\beta _{d+k}-\bar{B}_k}}{\prod _{k=0}^{K} Beta(B_k)}\right) }}\right) \rangle _{p(x)} \end{aligned}$$
(C34)

By Expansion,

$$\begin{aligned}{} & {} KL(P(X \mid \theta ),P(X \mid \theta ^\prime ))\nonumber \\{} & {} \qquad =\Bigg [\sum _{j=1}^d\alpha _j\log {x_j}-\sum _{j=1}^d x_j+\sum _{k=1}^K\alpha _{d+k}\log {x_{d+k}}\nonumber \\{} & {} \quad -\sum _{k=1}^K\bar{A}_k\log {x_{d+k}}+\sum _{k=0}^K\log {\Gamma (\sum _{j\in k}\alpha _{j})}\nonumber \\{} & {} \quad -\sum _{k=0}^K\sum _{j\in k}\log {\Gamma (\alpha _j)} \Bigg ] \nonumber \\{} & {} \quad -\Bigg [\sum _{j=1}^d\beta _j\log {x_j}-\sum _{j=1}^d x_j+\sum _{k=1}^K\beta _{d+k}\log {x_{d+k}}\nonumber \\ \end{aligned}$$
$$\begin{aligned}{} & {} \quad -\sum _{k=1}^K\bar{B}_k\log {x_{d+k}}\nonumber \\{} & {} \quad +\sum _{k=0}^K\log {\Gamma (\sum _{j\in k}\beta _{j})}-\sum _{k=0}^K\sum _{j\in k}\log {\Gamma (\beta _j)} \Bigg ] \end{aligned}$$
(C35)
$$\begin{aligned} {\begin{matrix} KL(P(X \mid \theta ),P(X \mid \theta ^\prime ))&{}=\sum _{j=1}^d(\alpha _j-\beta _j)\langle \log {x_j}\rangle _{p(x)}\\ {} &{}+\sum _{k=1}^K(\alpha _{d+k}+\bar{B}_k-\beta _{d+k}-\bar{A}_k)\langle \log {x_{d+k}}\rangle _{p(x)}\\ {} &{}+\sum _{k=0}^K\left( \log {\Gamma (\sum _{j\in k}\alpha _{j})}-\log {\Gamma (\sum _{j\in k}\beta _{j})}\right) \\ {} &{}+\sum _{k=0}^K\sum _{j\in k}\left( \log {\Gamma (\beta _{j})}-\log {\Gamma (\alpha _{j})}\right) \end{matrix}} \end{aligned}$$
(C36)

where,

$$\begin{aligned} {\begin{matrix} &{}\langle \log {x_j}\rangle _{p(x)}=\Psi (\alpha _{j})-\Psi \left( \sum _{j=1}^d\alpha _j\right) \\ &{}\langle \log {x_{d+k}}\rangle _{p(x)}=\Psi (\alpha _{d+k})-\Psi \left( \sum _{k=1}^K\alpha _{d+k}\right) \end{matrix}} \end{aligned}$$
(C37)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Alkhawaja, F., Bouguila, N. Unsupervised nested Dirichlet finite mixture model for clustering. Appl Intell 53, 25232–25258 (2023). https://doi.org/10.1007/s10489-023-04888-8

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-04888-8

Keywords

Navigation