Abstract
Graphical models have been employed in a wide variety of computer vision tasks. Assignments of latent variables in typical models usually suffer the confused explanation in sampling way. In this paper we present discriminative sequential association Latent Dirichlet Allocation, a novel statistical model for the task of visual recognition, and especially focus on the case of few training examples. By introducing the switching variables and formulating the direct discriminative analysis, the sequential associations are considered as priori to establish a relevance determination mechanism to obtain the reasonable assignments of latent variables and avoid the invalid labeling oscillations. We demonstrate the power of our model on two common-used datasets, and the experiment results show that our model can achieve better performances with efficient convergence and give well interpretations of specific topic assignments at the same time.
Similar content being viewed by others
Notes
We call it as ‘LDA-50’ for brevity, which is the abbreviation of particular graphical model with given sampling iteration. All below are in the same case.
We use KBoW, LBoW, KSPM and LSPM short for “BoW representation (represent an image as an orderless collection of local features) + RBF kernel SVM”, “BoW representation + linear SVM”, “SPM representation (partition the image into sub-regions and compute histograms of local features found inside each sub-region) + RBF kernel SVM”, “SPM representation + linear SVM” respectively.
References
Andrieu C, De Freitas N, Doucet A, Jordan MI (2003) An introduction to MCMC for machine learning. Mach Learn 50(1–2):5–43
Bishop CM (2006) Pattern recognition and machine learning. Springer, New York
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3(4–5):993–1022
Bosch A, Zisserman A, Muonz X (2008) Scene classification using a hybrid generative/discriminative approach. IEEE Trans Pattern Anal Mach Intell 30(4):712–727
Endres I, Srikumar V, Chang MW, Hoiem D (2012) Learning shared body plans. In: IEEE conference on computer vision and pattern recognition, pp 3130–3137
Fernando B, Fromont E, Tuytelaars T (2014) Mining midlevel features for image classification. Int J Comput Vis 108(3):186–203
Fei-Fei L, Perona P (2005) A Bayesian hierarchical model for learning natural scene categories. IEEE Conf Comput Vis Pattern Recognit 2:524–531
Fei-Fei L, Fergus R, Perona P (2007) Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. Comput Vis Image Underst 106(1):59–70
Fei-Fei L, Perona P (2005) A Bayesian hierarchical model for learning natural scene categories. IEEE Conf Comput Vis Pattern Recognit San Diego, CA, USA 2:524–531
Fox EB, Sudderth EB, Jordan MI, Willsky AS (2011) A sticky HDP-HMM with application to speaker diarization. Ann Appl Stat 5(2A):1020–1056
Gustafsson F, Gunnarsson F, Bergman N, Forssell U, Jansson J, Karlsson R, Nordlund PJ (2002) Particle flters for positioning, navigation, and tracking. IEEE Trans Signal Process 50(2):425–437
Hofmann T (1999) Probabilistic latent semantic indexing. In: Proceedings of the 22nd annual international SIGIR conference on research and development in information retrieval, pp 50–57
Kim D, Chung Y, Oh A (2012) Variable selection for latent Dirichlet allocation. arXiv preprint arXiv:1205.1053
Kivinen JJ, Sudderth EB, Jordan MI (2007) Learning multiscale representations of natural scenes using dirichlet processes. In: IEEE international conference on computer vision, pp 1–8
Kwon J, Lee KM (2013) Wang-Landau Monte Carlo-based tracking methods for abrupt motions. IEEE Trans Pattern Anal Mach Intell 35(4):1011–1024
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp 1097–1105
Larlus D, Verbeek J, Jurie F (2010) Category level object segmentation by combining bag-of-words models with Dirichlet processes and random fields. Int J Comput Vis 88(2):238–253
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. IEEE Conf Comput Vis Pattern Recognit New York, USA 2:2169–2178
Li LJ, Socher R, Fei-Fei L (2009) Towards total scene understanding: classification, annotation and segmentation in an automatic framework. In: IEEE conference on computer vision and pattern recognition, pp 2036–2043
Li SZ (2009) Markov random field modeling in image analysis. Springer, London
Lowe DG (1999) Object recognition from local scale-invariant features. Proc Seventh IEEE Int Conf Comput Vis 2:1150–1157
Miller GA (1995) Wordnet: a lexical database for English. Commun ACM 38(11):39–41
Neal RM (2000) Markov chain sampling methods for dirichlet process mixture models. J Comput Graph Stat 9(2):249–265
Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175
Paisitkriangkrai S, Shen C, van den Hengel A (2012) Sharing features in multi-class boosting via group sparsity. In: IEEE conference on computer vision and pattern recognition, pp 2128–2135
Putthividhya D, Attias HT, Nagarajan SS (2010) Topic regression multi-modal latent dirichlet allocation for image annotation. In: IEEE conference on computer vision and pattern recognition, pp 3408–3415
Russell BC, Torralba A, Murphy KP, Freeman WT (2008) LabelMe: a database and web-based tool for image annotation. Int J Comput Vis 77(1–3):157–173
Sudderth EB, Torralba A, Freeman WT, Willsky AS (2005) Learning hierarchical models of scenes, objects, and parts. In: 10th IEEE international conference on computer vision, Beijing, China, vol 2, pp 1331–1338
Sudderth EB, Torralba A, Freeman WT, Willsky AS (2008) Describing visual scenes using transformed objects and parts. Int J Comput Vis 77(1–3):291–330
Tang S, Wang H, Shao J, Wu F, Chen M, Zhuang Y (2013) \(\pi\)LDA: document clustering with selective structural constraints. In: Proceedings of the 21st ACM international conference on multimedia, pp 753–756
Torralba A, Murphy KP, Freeman WT (2004) Sharing features: efficient boosting procedures for multiclass object detection. IEEE Conf Comput Vis Pattern Recognit Washington, DC, USA 2:762–769
Torralba A, Murphy KP, Freeman WT (2005) Contextual models for object detection using boosted random fields. In: Neural Information Processing Systems 17 (NIPS), pp 1401–1408
Teh YW, Jordan MI Beal MJ (2006) Hierarchical dirichlet processes. J Am Stat Assoc 101(476):1566–1581
Tu Z, Zhu SC (2002) Image segmentation by data-driven markov chain Monte Carlo. IEEE Trans Pattern Anal Mach Intell 24(5):657–673
Ullman S, Vidal-Naquet M, Sali E (2002) Visual features of intermediate complexity and their use in classification. Nat Neurosci 5(7):682–687
Wang X, Ma X, Grimson WEL (2009) Unsupervised activity perception in crowded and complicated scenes using hierarchical Bayesian models. IEEE Trans Pattern Anal Mach Intell 31(3):539–555
Yu X, Aloimonos Y (2010) Attribute-based transfer learning for object categorization with zero/one training example. In: Proceedings of European conference on computer vision, Springer, pp 127–140
Sivic J, Russell BC, Efros AA, Zisserman A, Freeman WT (2005) Discovering objects and their location in images. IEEE Int Conf Comput Vis 1:370–377
Zhu L, Chen Y, Torralba A, Freeman W, Yuille A (2010) Part and appearance sharing: recursive compositional models for multi-view. In: IEEE conference on computer vision and pattern recognition, pp 1919–1926
Acknowledgments
This work was supported by National Natural Science Foundation of China under Grants 61273237, Fundamental Research Funds for the Central Universities 2012HGCX0001 and the National Basic Research Program of China (973 Program) under Grant 2013CB329604.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yao, TT., Xie, Z., Gao, J. et al. Discriminative sequential association latent dirichlet allocation for visual recognition. Pattern Anal Applic 19, 719–730 (2016). https://doi.org/10.1007/s10044-014-0444-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10044-014-0444-0