Skip to main content
Log in

Discriminative sequential association latent dirichlet allocation for visual recognition

  • Theoretical Advances
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

Graphical models have been employed in a wide variety of computer vision tasks. Assignments of latent variables in typical models usually suffer the confused explanation in sampling way. In this paper we present discriminative sequential association Latent Dirichlet Allocation, a novel statistical model for the task of visual recognition, and especially focus on the case of few training examples. By introducing the switching variables and formulating the direct discriminative analysis, the sequential associations are considered as priori to establish a relevance determination mechanism to obtain the reasonable assignments of latent variables and avoid the invalid labeling oscillations. We demonstrate the power of our model on two common-used datasets, and the experiment results show that our model can achieve better performances with efficient convergence and give well interpretations of specific topic assignments at the same time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. We call it as ‘LDA-50’ for brevity, which is the abbreviation of particular graphical model with given sampling iteration. All below are in the same case.

  2. We use KBoW, LBoW, KSPM and LSPM short for “BoW representation (represent an image as an orderless collection of local features) + RBF kernel SVM”, “BoW representation + linear SVM”, “SPM representation (partition the image into sub-regions and compute histograms of local features found inside each sub-region) + RBF kernel SVM”, “SPM representation + linear SVM” respectively.

References

  1. Andrieu C, De Freitas N, Doucet A, Jordan MI (2003) An introduction to MCMC for machine learning. Mach Learn 50(1–2):5–43

    Article  MATH  Google Scholar 

  2. Bishop CM (2006) Pattern recognition and machine learning. Springer, New York

    MATH  Google Scholar 

  3. Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3(4–5):993–1022

    MATH  Google Scholar 

  4. Bosch A, Zisserman A, Muonz X (2008) Scene classification using a hybrid generative/discriminative approach. IEEE Trans Pattern Anal Mach Intell 30(4):712–727

    Article  Google Scholar 

  5. Endres I, Srikumar V, Chang MW, Hoiem D (2012) Learning shared body plans. In: IEEE conference on computer vision and pattern recognition, pp 3130–3137

  6. Fernando B, Fromont E, Tuytelaars T (2014) Mining midlevel features for image classification. Int J Comput Vis 108(3):186–203

    Article  MathSciNet  Google Scholar 

  7. Fei-Fei L, Perona P (2005) A Bayesian hierarchical model for learning natural scene categories. IEEE Conf Comput Vis Pattern Recognit 2:524–531

    Google Scholar 

  8. Fei-Fei L, Fergus R, Perona P (2007) Learning generative visual models from few training examples: an incremental Bayesian approach tested on 101 object categories. Comput Vis Image Underst 106(1):59–70

    Article  Google Scholar 

  9. Fei-Fei L, Perona P (2005) A Bayesian hierarchical model for learning natural scene categories. IEEE Conf Comput Vis Pattern Recognit San Diego, CA, USA 2:524–531

    Google Scholar 

  10. Fox EB, Sudderth EB, Jordan MI, Willsky AS (2011) A sticky HDP-HMM with application to speaker diarization. Ann Appl Stat 5(2A):1020–1056

    Article  MathSciNet  MATH  Google Scholar 

  11. Gustafsson F, Gunnarsson F, Bergman N, Forssell U, Jansson J, Karlsson R, Nordlund PJ (2002) Particle flters for positioning, navigation, and tracking. IEEE Trans Signal Process 50(2):425–437

    Article  Google Scholar 

  12. Hofmann T (1999) Probabilistic latent semantic indexing. In: Proceedings of the 22nd annual international SIGIR conference on research and development in information retrieval, pp 50–57

  13. Kim D, Chung Y, Oh A (2012) Variable selection for latent Dirichlet allocation. arXiv preprint arXiv:1205.1053

  14. Kivinen JJ, Sudderth EB, Jordan MI (2007) Learning multiscale representations of natural scenes using dirichlet processes. In: IEEE international conference on computer vision, pp 1–8

  15. Kwon J, Lee KM (2013) Wang-Landau Monte Carlo-based tracking methods for abrupt motions. IEEE Trans Pattern Anal Mach Intell 35(4):1011–1024

    Article  Google Scholar 

  16. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp 1097–1105

  17. Larlus D, Verbeek J, Jurie F (2010) Category level object segmentation by combining bag-of-words models with Dirichlet processes and random fields. Int J Comput Vis 88(2):238–253

    Article  MathSciNet  Google Scholar 

  18. Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. IEEE Conf Comput Vis Pattern Recognit New York, USA 2:2169–2178

    Google Scholar 

  19. Li LJ, Socher R, Fei-Fei L (2009) Towards total scene understanding: classification, annotation and segmentation in an automatic framework. In: IEEE conference on computer vision and pattern recognition, pp 2036–2043

  20. Li SZ (2009) Markov random field modeling in image analysis. Springer, London

    MATH  Google Scholar 

  21. Lowe DG (1999) Object recognition from local scale-invariant features. Proc Seventh IEEE Int Conf Comput Vis 2:1150–1157

    Article  Google Scholar 

  22. Miller GA (1995) Wordnet: a lexical database for English. Commun ACM 38(11):39–41

    Article  Google Scholar 

  23. Neal RM (2000) Markov chain sampling methods for dirichlet process mixture models. J Comput Graph Stat 9(2):249–265

    MathSciNet  Google Scholar 

  24. Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175

    Article  MATH  Google Scholar 

  25. Paisitkriangkrai S, Shen C, van den Hengel A (2012) Sharing features in multi-class boosting via group sparsity. In: IEEE conference on computer vision and pattern recognition, pp 2128–2135

  26. Putthividhya D, Attias HT, Nagarajan SS (2010) Topic regression multi-modal latent dirichlet allocation for image annotation. In: IEEE conference on computer vision and pattern recognition, pp 3408–3415

  27. Russell BC, Torralba A, Murphy KP, Freeman WT (2008) LabelMe: a database and web-based tool for image annotation. Int J Comput Vis 77(1–3):157–173

    Article  Google Scholar 

  28. Sudderth EB, Torralba A, Freeman WT, Willsky AS (2005) Learning hierarchical models of scenes, objects, and parts. In: 10th IEEE international conference on computer vision, Beijing, China, vol 2, pp 1331–1338

  29. Sudderth EB, Torralba A, Freeman WT, Willsky AS (2008) Describing visual scenes using transformed objects and parts. Int J Comput Vis 77(1–3):291–330

    Article  Google Scholar 

  30. Tang S, Wang H, Shao J, Wu F, Chen M, Zhuang Y (2013) \(\pi\)LDA: document clustering with selective structural constraints. In: Proceedings of the 21st ACM international conference on multimedia, pp 753–756

  31. Torralba A, Murphy KP, Freeman WT (2004) Sharing features: efficient boosting procedures for multiclass object detection. IEEE Conf Comput Vis Pattern Recognit Washington, DC, USA 2:762–769

    Google Scholar 

  32. Torralba A, Murphy KP, Freeman WT (2005) Contextual models for object detection using boosted random fields. In: Neural Information Processing Systems 17 (NIPS), pp 1401–1408

  33. Teh YW, Jordan MI Beal MJ (2006) Hierarchical dirichlet processes. J Am Stat Assoc 101(476):1566–1581

    Article  MathSciNet  MATH  Google Scholar 

  34. Tu Z, Zhu SC (2002) Image segmentation by data-driven markov chain Monte Carlo. IEEE Trans Pattern Anal Mach Intell 24(5):657–673

    Article  Google Scholar 

  35. Ullman S, Vidal-Naquet M, Sali E (2002) Visual features of intermediate complexity and their use in classification. Nat Neurosci 5(7):682–687

    Google Scholar 

  36. Wang X, Ma X, Grimson WEL (2009) Unsupervised activity perception in crowded and complicated scenes using hierarchical Bayesian models. IEEE Trans Pattern Anal Mach Intell 31(3):539–555

    Article  Google Scholar 

  37. Yu X, Aloimonos Y (2010) Attribute-based transfer learning for object categorization with zero/one training example. In: Proceedings of European conference on computer vision, Springer, pp 127–140

  38. Sivic J, Russell BC, Efros AA, Zisserman A, Freeman WT (2005) Discovering objects and their location in images. IEEE Int Conf Comput Vis 1:370–377

    Google Scholar 

  39. Zhu L, Chen Y, Torralba A, Freeman W, Yuille A (2010) Part and appearance sharing: recursive compositional models for multi-view. In: IEEE conference on computer vision and pattern recognition, pp 1919–1926

Download references

Acknowledgments

This work was supported by National Natural Science Foundation of China under Grants 61273237, Fundamental Research Funds for the Central Universities 2012HGCX0001 and the National Basic Research Program of China (973 Program) under Grant 2013CB329604.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhao Xie.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yao, TT., Xie, Z., Gao, J. et al. Discriminative sequential association latent dirichlet allocation for visual recognition. Pattern Anal Applic 19, 719–730 (2016). https://doi.org/10.1007/s10044-014-0444-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10044-014-0444-0

Keywords

Navigation