Infinite Dirichlet mixture models learning via expectation propagation

Fan, Wentao; Bouguila, Nizar

doi:10.1007/s11634-013-0152-4

Infinite Dirichlet mixture models learning via expectation propagation

Regular Article
Published: 21 September 2013

Volume 7, pages 465–489, (2013)
Cite this article

Advances in Data Analysis and Classification Aims and scope Submit manuscript

Wentao Fan¹ &
Nizar Bouguila²

585 Accesses
5 Citations
Explore all metrics

Abstract

In this article, we propose a novel Bayesian nonparametric clustering algorithm based on a Dirichlet process mixture of Dirichlet distributions which have been shown to be very flexible for modeling proportional data. The idea is to let the number of mixture components increases as new data to cluster arrive in such a manner that the model selection problem (i.e. determination of the number of clusters) can be answered without recourse to classic selection criteria. Thus, the proposed model can be considered as an infinite Dirichlet mixture model. An expectation propagation inference framework is developed to learn this model by obtaining a full posterior distribution on its parameters. Within this learning framework, the model complexity and all the involved parameters are evaluated simultaneously. To show the practical relevance and efficiency of our model, we perform a detailed analysis using extensive simulations based on both synthetic and real data. In particular, real data are generated from three challenging applications namely images categorization, anomaly intrusion detection and videos summarization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Variational Learning of Finite Inverted Dirichlet Mixture Models and Applications

Variational Bayesian Inference for Infinite Dirichlet Mixture Towards Accurate Data Categorization

Article 18 April 2018

Variational Bayesian inference for infinite generalized inverted Dirichlet mixtures with feature selection and its application to clustering

Article 06 October 2015

Notes

Proportional data are the data that contain two constraints: non-negativity and unit-sum.
All figures with colors can be found in the electronic version of the paper.
Source code of PCA-SIFT: http://www.cs.cmu.edu/~yke/pcasift.
http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html.
A connection is a sequence of TCP packets starting and ending at some well defined times, between which data flows to and from a source IP address to a target IP address under some well defined protocol.

References

Bishop CM (1999) Variational principal components. In: Proceedings of international conference on artificial neural networks (ICANN), vol. 1, pp 509–514
Blackwell D, MacQueen J (1973) Ferguson distributions via pólya urn schemes. Ann Stat 1(2):353–355
Article MathSciNet MATH Google Scholar
Blei DM, Jordan MI (2005) Variational inference for Dirichlet process mixtures. Bayesian Anal 1:121–144
Article MathSciNet Google Scholar
Bosch A, Zisserman A, Munoz X (2006) Scene classification via pLSA. In: Proceedings of 9th European conference on computer vision (ECCV), pp 517–530
Bouguila N (2007) Spatial color image databases summarization. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP), Honolulu, pp I-953–I-956
Bouguila N (2012) Infinite Liouville mixture models with application to text and texture categorization. Pattern Recognit Lett 33(2):103–110
Article Google Scholar
Bouguila N, Ziou D (2005a) Mml-based approach for finite dirichlet mixture estimation and selection. In: Perner P, Imiya A (eds) MLDM. Lecture Notes in Computer Science, vol 3587. Springer, Berlin, pp 42–51
Bouguila N, Ziou D (2005b) On fitting finite dirichlet mixture using ecm and mml. In: Singh S, Singh M, Apté C, Perner P (eds) ICAPR (1). Lecture Notes in Computer Science, vol 3686. Springer, Berlin, pp 172–182
Bouguila N, Ziou D (2005c) Using unsupervised learning of a finite Dirichlet mixture model to improve pattern recognition applications. Pattern Recognit Lett 26(12):1916–1925
Article Google Scholar
Bouguila N, Ziou D (2006a) Online clustering via finite mixtures of dirichlet and minimum message length. Eng Appl Artif Intell 19(4):371–379
Article Google Scholar
Bouguila N, Ziou D (2006b) Unsupervised selection of a finite Dirichlet mixture model: an mml-based approach. IEEE Trans Knowl Data Eng 18(8):993–1009
Article Google Scholar
Bouguila N, Ziou D (2008) A Dirichlet process mixture of Dirichlet distributions for classification and prediction. In: Proceedings of the IEEE workshop on machine learning for signal processing (MLSP), pp 297–302
Bouguila N, Ziou D (2010) A Dirichlet process mixture of generalized Dirichlet distributions for proportional data modeling. IEEE Trans Neural Netw 21(1):107–122
Article Google Scholar
Bouguila N, Wang JH, Hamza AB (2010) Software modules categorization through likelihood and Bayesian analysis of finite Dirichlet mixtures. J Appl Stat 37(2):235–252
Article MathSciNet Google Scholar
Chang S, Dasgupta N, Carin L (2005) A Bayesian approach to unsupervised feature selection and density estimation using expectation propagation. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition (CVPR), pp 1043–1050
Csurka G, Dance CR, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: Workshop on statistical learning in computer vision, 8th European conference on computer vision (ECCV), pp 1–22
Draper BA, Hanson AR, Riseman EM (1996) Knowledge-directed vision: control, learning, and integration. Proc IEEE 84:1625–1637
Article Google Scholar
Drummond T, Caelli T (2000) Learning task-specific object recognition and scene understanding. Comput Vis Image Underst 80:315–348
Article MATH Google Scholar
Elkan C (2003) Using the triangle inequality to accelerate k-means. In: Proceedings of the international conference on machine learning (ICML), pp 147–153
Fan W, Bouguila N, Ziou D (2012) Variational learning for finite dirichlet mixture models and applications. IEEE Trans Neural Netw Learn Syst 23(5):762–774
Article Google Scholar
Ferguson TS (1983) Bayesian density estimation by mixtures of normal distributions. Recent Adv Stat 24:287–302
MathSciNet Google Scholar
Fraley C, Raftery AE (2003) Enhanced model-based clustering, density estimation, and discriminant analysis software: MCLUST. J Classif 20(2):263–286
Article MathSciNet MATH Google Scholar
Gibson D, Campbell N, Thomas B (2002) Visual abstraction of wildlife footage using Gaussian mixture models and the minimum description length criterion. In: Proceedings of international conference on pattern recognition (ICPR), vol. 2, pp 814–817
Gong Y, Liu X (2000) Video summarization using singular value decomposition. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR), vol. 2, pp 174–180
Hansen KM, Tukey JW (1992) Tuning a major part of a clustering algorithm. Int Stat Rev 60(1):21–43
Article MATH Google Scholar
Hofmann T (2001) Unsupervised learning by probabilistic latent semantic analysis. Mach Learn 42(1/2):177–196
Article MATH Google Scholar
Hu W, Hu W, Maybank S (2008) Adaboost-based algorithm for network intrusion detection. IEEE Trans Syst Man Cybern Part B Cybern 38(2):577–583
Article Google Scholar
Ishwaran H, James LF (2001) Gibbs sampling methods for stick-breaking priors. J Am Stat Assoc 96: 161–173
Article MathSciNet MATH Google Scholar
Ke Y, Sukthankar R (2004) PCA-SIFT: a more distinctive representation for local image descriptors. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition (CVPR), pp 506–513
Khan L, Awad M, Thuraisingham B (2007) A new intrusion detection system using support vector machines and hierarchical clustering. VLDB J 16:507–521
Article Google Scholar
Korwar RM, Hollander M (1973) Contributions to the theory of Dirichlet processes. Ann Probab 1:705–711
Article MathSciNet MATH Google Scholar
Lin TI, Lee JC, Ho HJ (2006) On fast supervised learning for normal mixture models with missing information. Pattern Recognit Lett 39:1177–1187
Google Scholar
Lippmann R, Haines JW, Fried DJ, Korba J, Das K (2000) Analysis and results of the 1999 DARPA off-line intrusion detection evaluation. In: Proceedings of the third international workshop on recent advances in intrusion detection. Springer, Berlin, pp 162–182
Liu T, Zhang HJ, Qi F (2003) A novel video key-frame-extraction algorithm based on perceived motion energy model. IEEE Trans Circuits Syst Video Technol 13(10):1006–1013
Article Google Scholar
Liu Y, Chen K, Liao X, Zhang W (2004) A genetic clustering method for intrusion detection. Pattern Recognit 37(5):927–942
Article Google Scholar
Ma Z, Leijon A (2010) Expectation propagation for estimating the parameters of the beta distribution. In: Proceedings of the IEEE international conference on acoustics, speech, and signal processing (ICASSP), pp 2082–2085
Maybeck PS (1982) Stochastic models, estimation and control. Academic Press, London
MATH Google Scholar
McHugh J, Christie A, Allen J (2000) Defending yourself: the role of intrusion detection systems. IEEE Softw 17(5):42–51
Article Google Scholar
McLachlan G, Peel D (2000) Finite mixture models. Wiley, New York
Book MATH Google Scholar
Mikolajczyk K, Schmid C (2005) A performance evaluation of local descriptors. IEEE Trans Pattern Anal Mach Intell 27(10):1615–1630
Article Google Scholar
Minka T (2001) Expectation propagation for approximate Bayesian inference. In: Proceedings of the conference on uncertainty in artificial intelligence (UAI), pp 362–369
Minka T, Ghahramani Z (2003) Expectation propagation for infinite mixtures. In: NIPS’03 workshop on nonparametric Bayesian methods and infinite models
Minka T, Lafferty J (2002) Expectation-propagation for the generative aspect model. In: Proceedings of the conference on uncertainty in artificial intelligence (UAI), pp 352–359
Neal RM (2000) Markov chain sampling methods for Dirichlet process mixture models. J Comput Graph Stat 9(2):249–265
MathSciNet Google Scholar
Ngo CW, Ma YF, Zhang HJ (2003) Automatic video summarization by graph modeling. In: Proceedings of IEEE international conference on computer vision (ICCV), vol. 1, pp 104–109
Nilsback ME, Zisserman A (2006) A visual vocabulary for flower classification. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition (CVPR), vol 2, pp 1447–1454
Northcutt S, Novak J (2002) Network intrusion detection: an analyst’s handbook. New Riders Publishing
Pollard D (1982) A central limit theorem for k-means clustering. Ann Probab 10(4):919–926
Article MathSciNet MATH Google Scholar
Rasiwasia N, Vasconcelos N (2008) Scene classification with low-dimensional semantic spaces and weak supervision. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR), pp 1–8
Rasmussen CE (2000) The infinite Gaussian mixture model. In: Proceedings of advances in neural information processing systems (NIPS). MIT Press, Cambridge, pp 554–560
Robert C, Casella G (1999) Monte Carlo statistical methods. Springer, Berlin
Sahouria E, Zakhor A (1999) Content analysis of video using principal components. IEEE Trans Circuits Syst Video Technol 9(8):1290–1298
Article Google Scholar
Sethuraman J (1994) A constructive definition of Dirichlet priors. Stat Sin 4:639–650
MathSciNet MATH Google Scholar
Shen X, Ye J (2002) Adaptive model selection. J Am Stat Assoc 97(457):210–221
Article MathSciNet MATH Google Scholar
Singh S, Haddon J, Markou M (2001) Nearest-neighbour classifiers in natural scene analysis. Pattern Recognit 34:1601–1612
Article MATH Google Scholar
Teh YW, Jordan MI, Beal MJ, Blei DM (2004) Hierarchical Dirichlet processes. J Am Stat Assoc 101: 705–711
MathSciNet Google Scholar
Truong BT, Venkatesh S (2007) Video abstraction: a systematic review and classification. ACM Trans Multimed Comput Commun Appl 3(1)
Wong MA, Lane T (1983) A kth nearest neighbour clustering procedure. J R Stat Soc Ser B (Methodological) 45(3):362–368
MathSciNet MATH Google Scholar
Ye N, Li X, Chen Q, Erman SM, Xu M (2001) Probabilistic techniques for intrusion detection based on computer audit data. IEEE Trans Syst Man Cybern Part A 31(4):266–274
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, Concordia University, Montreal, QC, Canada
Wentao Fan
Concordia Institute for Information Systems Engineering (CIISE), Concordia University, Montreal, QC, Canada
Nizar Bouguila

Authors

Wentao Fan
View author publications
You can also search for this author in PubMed Google Scholar
Nizar Bouguila
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nizar Bouguila.

The calculation of $Z_i$ in Eq. (17)

The normalized constant $Z_i$ in Eq. (17) can be calculated as

$$\begin{aligned} Z_i \!=\! \int f_i(\varTheta )q^{\setminus i}(\varTheta )d\varTheta \!=\!\sum _{j=1}^M \bar{\lambda }_j\prod _{s=1}^{j-1}(1-\bar{\lambda }_s) \!\int \! \mathrm{Dir }\left( \mathbf X _i|{\varvec{\alpha }}_j\right) N\left( {\varvec{\alpha }}_j|{\varvec{\mu }}_j^{\setminus i},A_j^{\setminus i}\right) \mathrm{d }{\varvec{\alpha }}_j\nonumber \\ \end{aligned}$$

(27)

where $\bar{\lambda }_j$ is the expected value of $\lambda _j$. Since the integration involved in Eq. (27) is analytically intractable, we tackle this problem by adopting the Laplace approximation to approximate the integrand with a Gaussian distribution as suggested in Ma and Leijon (2010).

First, we define $h({\varvec{\alpha }}_j)$ as the integrand in Eq. (27):

$$\begin{aligned} h({\varvec{\alpha }}_j) =\mathrm{Dir }(\mathbf X _i|{\varvec{\alpha }}_j)\mathcal{N }\left( {\varvec{\alpha }}_j|\varvec{\mu }_j^{\setminus i},A^{\setminus i}_{j}\right) \end{aligned}$$

(28)

Then, the normalized distribution for this integrand which is indeed a product of a Dirichlet distribution and a Gaussian distribution is given by

$$\begin{aligned} \mathcal{H }({\varvec{\alpha }}_j) =\frac{h({\varvec{\alpha }}_j)}{\int h({\varvec{\alpha }}_j)d{\varvec{\alpha }}_j} \end{aligned}$$

(29)

Our goal for the Laplace method the goal is to find a Gaussian approximation which is centered on the mode of the distribution $\mathcal{H }({\varvec{\alpha }}_j)$. We may obtain the mode ${\varvec{\alpha }}_j^*$ numerically by setting the first derivative of $\ln h({\varvec{\alpha }}_j)$ to 0, where $\ln h({\varvec{\alpha }}_j)$ can be calculated by

$$\begin{aligned} \ln h({\varvec{\alpha }}_j)&= \ln \frac{\sum _{l=1}^D\alpha _{jl}}{\prod _{l=1}^D\varGamma (\alpha _{jl})}\nonumber \\&+ \sum _{l=1}^D(\alpha _{jl}-1)\ln X_{il}\!-\! \frac{1}{2}\left( {\varvec{\alpha }}_j \!-\! {\varvec{\mu }}^{\setminus i}_{j}\right) ^T A^{\setminus i}_j \left( {\varvec{\alpha }}_j \!-\! {\varvec{\mu }}^{\setminus i}_{j}\right) \!+\!\text{ const. }\nonumber \\ \end{aligned}$$

(30)

Subsequently, we can calculate the first and second derivatives with respect to ${\varvec{\alpha }}_j$ as

$$\begin{aligned} \frac{\partial \ln h({\varvec{\alpha }}_j)}{\partial {\varvec{\alpha }}_j} =\left[ \begin{array}{c} \varPsi \left( \displaystyle \sum \limits _{l=1}^D\alpha _{jl}\right) - \varPsi (\alpha _{j1}) + \ln X_{i1}\\ \vdots \\ \varPsi \left( \displaystyle \sum \limits _{l=1}^D\alpha _{jl}\right) - \varPsi (\alpha _{jD}) + \ln X_{iD} \end{array}\right] -A_j^{\setminus i}\left( {\varvec{\alpha }}_j- {\varvec{\mu }}^{\setminus i}_j\right) \end{aligned}$$

(31)

and

$$\begin{aligned} \frac{\partial ^2\ln h({\varvec{\alpha }}_j)}{\partial {\varvec{\alpha }}_j^2} = \left[ \begin{array}{c@{\quad }c@{\quad }c} \varPsi '\left( \displaystyle \sum \limits _{l=1}^D\alpha _{jl}\right) - \varPsi '(\alpha _{j1}) &{} \cdots &{} \varPsi '\left( \displaystyle \sum \limits _{l=1}^D\alpha _{jl}\right) \\ \vdots &{} \ddots &{}\vdots \\ \varPsi '\left( \displaystyle \sum \limits _{l=1}^D\alpha _{jl}\right) &{} \cdots &{}\varPsi '\left( \displaystyle \sum \limits _{l=1}^D\alpha _{jl}\right) - \varPsi '(\alpha _{jD}) \end{array}\right] -A^{\setminus i}_{j}\nonumber \\ \end{aligned}$$

(32)

where $\varPsi (\cdot )$ is the digamma function. Then, we can approximate $h({\varvec{\alpha }}_j)$ using the obtained mode as

$$\begin{aligned} h({\varvec{\alpha }}_j)\simeq h({\varvec{\alpha }}_j^*)\exp \bigg (-\frac{1}{2}({\varvec{\alpha }}_j- {\varvec{\alpha }}_j^*)\widehat{A}_{j}({\varvec{\alpha }}_j- {\varvec{\alpha }}_j^*)\bigg )\quad \end{aligned}$$

(33)

where the precision matrix $\widehat{A}_{j}$ is given by

$$\begin{aligned} \widehat{A}_{j} = - \left. \frac{\partial ^2\ln h({\varvec{\alpha }}_j)}{\partial {\varvec{\alpha }}_j^2} \right| _{{\varvec{\alpha }}_j ={\varvec{\alpha }}_j^*} \end{aligned}$$

(34)

Therefore, the integration of $h({\varvec{\alpha }}_j)$ can be approximated by using Eq. (33) as

$$\begin{aligned} \int h({\varvec{\alpha }}_j)d{\varvec{\alpha }}_j \simeq h({\varvec{\alpha }}_j^*)\int \exp \left( \!-\!\frac{1}{2}({\varvec{\alpha }}_j\!-\!{\varvec{\alpha }}_j^*)\widehat{A}_{j}({\varvec{\alpha }}_j\!-v{\varvec{\alpha }}_j^*)\right) d{\varvec{\alpha }}_j\!=\! h({\varvec{\alpha }}_j^*) \frac{(2\pi )^{D/2}}{|\widehat{A}_j|^{1/2}}\nonumber \\ \end{aligned}$$

(35)

Finally, we can rewrite Eq. (27) as following:

$$\begin{aligned} Z_i=\sum _{j=1}^M \bar{\lambda }_j\prod _{s=1}^{j-1}(1-\bar{\lambda }_s)h\left( {\varvec{\alpha }}_j^*\right) \frac{(2\pi )^{D/2}}{|\widehat{A}_j|^{1/2}} \end{aligned}$$

(36)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fan, W., Bouguila, N. Infinite Dirichlet mixture models learning via expectation propagation. Adv Data Anal Classif 7, 465–489 (2013). https://doi.org/10.1007/s11634-013-0152-4

Download citation

Received: 11 March 2013
Revised: 20 August 2013
Accepted: 04 September 2013
Published: 21 September 2013
Issue Date: December 2013
DOI: https://doi.org/10.1007/s11634-013-0152-4

Keywords

Mathematics Subject Classification (2000)

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Infinite Dirichlet mixture models learning via expectation propagation

Abstract

Access this article

Similar content being viewed by others

Variational Learning of Finite Inverted Dirichlet Mixture Models and Applications

Variational Bayesian Inference for Infinite Dirichlet Mixture Towards Accurate Data Categorization

Variational Bayesian inference for infinite generalized inverted Dirichlet mixtures with feature selection and its application to clustering

Notes

References

Author information

Authors and Affiliations

Corresponding author

The calculation of \(Z_i\) in Eq. (17)

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification (2000)

Navigation

Infinite Dirichlet mixture models learning via expectation propagation

Abstract

Access this article

Similar content being viewed by others

Variational Learning of Finite Inverted Dirichlet Mixture Models and Applications

Variational Bayesian Inference for Infinite Dirichlet Mixture Towards Accurate Data Categorization

Variational Bayesian inference for infinite generalized inverted Dirichlet mixtures with feature selection and its application to clustering

Notes

References

Author information

Authors and Affiliations

Corresponding author

The calculation of \(Z_i\) in Eq. (17)

The calculation of \(Z_i\) in Eq. (17)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2000)

Search

Navigation