Skip to main content

Advertisement

Log in

Fuzzy bag of words for social image description

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Rapid growth of social media resources brings huge challenges and opportunities for image description technologies. The performance of image description method directly affects the accuracy of image retrieval, image annotation and image recognition. Bag of Words (BoW) as an efficient approach to describing the images has been attracting more and more attention. However, in traditional BoW, the maps between the words in the codebook and the features extracted from the images are actually ambiguous. As the Fuzzy Sets Theory (FST) is a powerful means for dealing with uncertainty efficiently, we utilize the FST to solve the problem caused by the ambiguity between the features and words. Accordingly, we propose a new type of BoW named as FBoW to describe images based on FST. Firstly, the features are extracted from the images. Secondly, k-means is utilized to learn the codebook. Thirdly, a fuzzy membership function is designed to measure the similarity between the features and words. The optimal parameters of the fuzzy membership function are obtained by using a Genetic Algorithm (GA). The histogram is generated by adding up the fuzzy membership values of each word to describe the images. The experimental results show that the proposed FBoW outperforms traditional BoW for social image description.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  1. Banerji S, Sinha A, Liu C (2013) A New Bag of Words LBP (BoWL) Descriptor for Scene Image Classification. In: 15th International Conference on Computer Analysis of Images and Patterns, CAIP 2013. Springer. York, UK, pp 490–497. doi: 10.1007/978-3-642-40261-6_59

    Chapter  Google Scholar 

  2. Chatfield K, Lempitsky V, Vedaldi A, Zisserman A (2011) The devil is in the details: an evaluation of recent feature encoding methods. BMVC 2011:76.1–76.12. doi:10.5244/C.25.76

    Google Scholar 

  3. Farhangi MM, Soryani M, Fathy M (2013) Improvement the Bag of Words Image Representation Using Spatial Information. In: Proceedings of the Second International Conference on Advances in Computing and Information Technology, ACITY 2012. Springer, Chennai, India, pp 681–690. doi:10.1007/978-3-642-31552-7_69

    Chapter  Google Scholar 

  4. Grana C, Borghesani D, Manfredi M, Cucchiara R (2013) A fast approach for integrating ORB descriptors in the bag of words model. In: Proc. SPIE 8667, Multimedia Content and Mobile Devices. SPIE, Burlingame, California, USA, pp 866709-866709-8. doi:10.1117/12.2008460

  5. Huang Q (2011) Discovery of time-inconsecutive co-movement patterns of foreign currencies using an evolutionary biclustering method. Appl Math Comput 218(8):4353–4363. doi:10.1016/j.amc.2011.10.011

    MathSciNet  MATH  Google Scholar 

  6. Huang Q, Lee S, Liu L, Lu M, Jin L, Li A (2010) A robust graph-based segmentation method for breast tumors in ultrasound images. Ultrasonics 52(2):266–275. doi:10.1016/j.ultras.2011.08.011

    Article  Google Scholar 

  7. Ji R, Duan L, Chen J, Xie L, Yao H, Gao W (2013) Learning to distribute vocabulary indexing for scalable visual search. IEEE Trans on Multimedia 15(1):153–166. doi:10.1109/TMM.2012.2225035

    Article  Google Scholar 

  8. Ji R, Duan L, Chen J, Yao H, Yuan J, Rui Y, Gao W (2012) Location discriminative vocabulary coding for mobile landmark search. Int J Comput Vis 96(3):290–314. doi:10.1007/s11263-011-0472-9

    Article  Google Scholar 

  9. Ji R, Gao Y, Hong R, Liu Q, Tao D, Li X (2014) Spectral-spatial constraint hyperspectral image classification. IEEE Trans Geosci Remote Sens 52(3):1811–1824. doi:10.1109/TGRS.2013.2255297

    Article  Google Scholar 

  10. Ji R, Yao H, Liu W, Sun X, Tian Q (2012) Task-dependent visual-codebook compression. IEEE Trans Image Process 21(4):2282–2293. doi:10.1109/TIP.2011.2176950

    Article  MathSciNet  Google Scholar 

  11. Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2006. IEEE, New York, NY, USA, pp 2169–2178. doi: 10.1109/CVPR.2006.68

  12. Li W, Dong P (2013) Object recognition based on the region of interest and optical bag of words model. In: Proceedings of the Fifth International Conference on Internet Multimedia Computing and Service, ICIMCS 2013. ACM, New York, USA, pp 394–398. doi: 10.1145/2499788.2499873

  13. Li X, Huang Q, Jin L, Wei G, Tao D (2011) Exploiting local coherent patterns for unsupervised feature ranking. IEEE Trans on Syst, Man and Cybern Part B Cybern 41(6):1471–1482. doi:10.1109/TSMCB.2011.2151256

    Article  Google Scholar 

  14. Li Y, Liu W, Li X, Huang Q, Li X (2013) GA-SIFT: A new scale invariant feature transform for multispectral image using geometric algebra. Information Sciences. (In press)

  15. Li F, Pietro P (2005) A bayesian hierarchical model for learning natural scene categories. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005. IEEE, San Diego, CA, USA, pp 524–531. doi: 10.1109/CVPR.2005.16

  16. Li J, Tao D (2013) Simple exponential family PCA. IEEE Trans on Neural Netw and Learn Syst 24(3):485–497. doi:10.1109/TNNLS.2012.2234134

    Article  Google Scholar 

  17. Li J, Tao D (2013) Exponential family factors for Bayesian factor analysis. IEEE Trans on Neural Netw and Learn Syst 24(6):964–976. doi:10.1109/TNNLS.2013.2245341

    Article  Google Scholar 

  18. Liu W, Tao D (2013) Multiview hessian regularization for image annotation. IEE Trans on Image Process 22(7):2676–2687. doi:10.1109/TIP.2013.2255302

    Article  MathSciNet  Google Scholar 

  19. Lowe DG (1999) Object recognition from local scale-invariant features. In: The Proceedings of the Seventh IEEE International Conference on Computer Vision, ICCV 1999. IEEE, Kerkyra, Greece, pp 1150–1157. doi:10.1109/ICCV.1999.790410

  20. Lowe DG (2004) Distinctive image features from scale-invariant Key points. Int J Comput Vis 60(2):91–110. doi:10.1023/B:VISI.0000029664.99615.94

    Article  Google Scholar 

  21. Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2008) Lost in quantization: Improving particular object retrieval in large scale image databases. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008. IEEE, Anchorage, AK, USA, pp 1–8. doi : 10.1109/CVPR.2008.4587635

  22. Tao D, Jin L (2012) Discriminative information preservation for face recognition. Neurocomputing 91:11–20. doi:10.1016/j.neucom.2012.02.024

    Article  Google Scholar 

  23. Tao D, Li X, Wu X, Maybank SJ (2007) General tensor discriminant analysis and Gabor features for gait recognition. IEEE Trans on Pattern Anal and Mach Intel 29(10):1700–1715. doi:10.1109/TPAMI.2007.1096

    Article  Google Scholar 

  24. Tao D, Liang L, Jin L, Gao Y (2011) Similar Handwritten Chinese Character Recognition Using Discriminative Locality Alignment Manifold Learning. In: International Conference on Document Analysis and Recognition, ICDAR 2011. IEEE, Beijing, China, pp 1012–1016. doi:10.1109/ICDAR.2011.205

  25. van Gemert J C, Geusebroek J M, Veenman C J, Smeulders AWM (2008) Kernel codebooks for scene categorization. In: 10th European Conference on Computer Vision, ECCV 2008. Springer, Marseille, France, pp 696–709. doi:10.1007/978-3-540-88690-7_52

    Chapter  Google Scholar 

  26. van Gemert JC, Veenman CJ, Smeulders AW, Geusebroek JM (2010) Visual word ambiguity. IEEE Trans on Pattern Anal and Mach Intel 32(7):1271–1283. doi:10.1109/TPAMI.2009.132

    Article  Google Scholar 

  27. Wang J, Yang J, Yu K, Lv F, Huang T, Gong Y (2010) Locality-constrained linear coding for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2010. IEEE, San Francisco, CA, pp 3360–3367. doi:10.1109/CVPR.2010.5540018

  28. Wu L, Hoi SCH (2011) Enhancing Bag-of-words models with semantics-preserving metric learning. IEEE Multimedia 18(1):24–37. doi:10.1109/MMUL.2011.7

    Article  Google Scholar 

  29. Wu L, Hoi SCH, Yu N (2010) Semantics-preserving bag-of-words models and applications. IEEE Trans Image Process 19(7):1908–1920. doi:10.1109/TIP.2010.2045169

    Article  MathSciNet  Google Scholar 

  30. Wu Z, Ke Q, Sun J, Shum H (2009) A multi-sample, multi-tree approach to bag-of-words image representation for image retrieval. In: IEEE 12th International Conference on Computer Vision. ICCV 2009, pp 1992–1999. doi:10.1109/ICCV.2009.5459439

  31. Zadeh LA (1965) Fuzzy sets. Inf Control 8(3):338–353

    Article  Google Scholar 

  32. Zha Z, Zhang H, Wang M, Luan H, Chua TS(2013) Detecting Group Activities with Multi-Camera Context. IEEE Transactions on Circuits and Systems for Video Technologies 23(5):856–869. doi: 10.1109/TCSVT.2012.2226526

    Article  Google Scholar 

  33. Zha Z, Wang M, Zheng Y, Yang Y, Hong R, Chua TS (2012) Interactive Video Indexing With Statistical Active Learning. IEEE Transactions on Multimedia 14(1): 17–27. doi: 10.1109/TMM.2011.2174782

    Article  Google Scholar 

  34. Zha Z, Yang L, Mei T, Wang M, Wang Z, Chua TS, Hua X (2010) Visual query suggestion: Towards Capturing User Intent in Internet Image Search. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMMCAP) 6(3), Article No. 13doi: 10.1145/1823746.1823747

    Article  Google Scholar 

  35. Zhang Y, Jin R, Zhou Z (2010) Understanding bag-of-words model: a statistical framework. Int J Mach Learn Cybern 1(1–4):43–52. doi:10.1007/s13042-010-0001-0

    Article  Google Scholar 

  36. Zheng S, Huang Q, Jin L, Wei G (2011) Real-time extended-field-of-view ultrasound based on a standard PC. Appl Acoust 73(4):423–432. doi:10.1016/j.apacoust.2011.09.013

    Article  Google Scholar 

Download references

Acknowledgments

The research was supported by National Natural Science Funds of China (Nos. 61125106, 61372007, 91120302, and 61072093), Guangdong Provincial Project of Transportation Science and Technology (No. 2012-02-084), Natural Science Funds of Guangdong Province (No. S2012010009885), the Fundamental Research Funds for the Central Universities (No. 2014ZG0038), Projects of innovative science and technology, Department of Education, Guangdong Province (No. 2013KJCX0012), and Shaanxi Key Innovation Team of Science and Technology (Grant No.: 2012KCT-04).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qinghua Huang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Y., Liu, W., Huang, Q. et al. Fuzzy bag of words for social image description. Multimed Tools Appl 75, 1371–1390 (2016). https://doi.org/10.1007/s11042-014-2138-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-014-2138-4

Keywords

Navigation