Abstract
Rapid growth of social media resources brings huge challenges and opportunities for image description technologies. The performance of image description method directly affects the accuracy of image retrieval, image annotation and image recognition. Bag of Words (BoW) as an efficient approach to describing the images has been attracting more and more attention. However, in traditional BoW, the maps between the words in the codebook and the features extracted from the images are actually ambiguous. As the Fuzzy Sets Theory (FST) is a powerful means for dealing with uncertainty efficiently, we utilize the FST to solve the problem caused by the ambiguity between the features and words. Accordingly, we propose a new type of BoW named as FBoW to describe images based on FST. Firstly, the features are extracted from the images. Secondly, k-means is utilized to learn the codebook. Thirdly, a fuzzy membership function is designed to measure the similarity between the features and words. The optimal parameters of the fuzzy membership function are obtained by using a Genetic Algorithm (GA). The histogram is generated by adding up the fuzzy membership values of each word to describe the images. The experimental results show that the proposed FBoW outperforms traditional BoW for social image description.
Similar content being viewed by others
References
Banerji S, Sinha A, Liu C (2013) A New Bag of Words LBP (BoWL) Descriptor for Scene Image Classification. In: 15th International Conference on Computer Analysis of Images and Patterns, CAIP 2013. Springer. York, UK, pp 490–497. doi: 10.1007/978-3-642-40261-6_59
Chatfield K, Lempitsky V, Vedaldi A, Zisserman A (2011) The devil is in the details: an evaluation of recent feature encoding methods. BMVC 2011:76.1–76.12. doi:10.5244/C.25.76
Farhangi MM, Soryani M, Fathy M (2013) Improvement the Bag of Words Image Representation Using Spatial Information. In: Proceedings of the Second International Conference on Advances in Computing and Information Technology, ACITY 2012. Springer, Chennai, India, pp 681–690. doi:10.1007/978-3-642-31552-7_69
Grana C, Borghesani D, Manfredi M, Cucchiara R (2013) A fast approach for integrating ORB descriptors in the bag of words model. In: Proc. SPIE 8667, Multimedia Content and Mobile Devices. SPIE, Burlingame, California, USA, pp 866709-866709-8. doi:10.1117/12.2008460
Huang Q (2011) Discovery of time-inconsecutive co-movement patterns of foreign currencies using an evolutionary biclustering method. Appl Math Comput 218(8):4353–4363. doi:10.1016/j.amc.2011.10.011
Huang Q, Lee S, Liu L, Lu M, Jin L, Li A (2010) A robust graph-based segmentation method for breast tumors in ultrasound images. Ultrasonics 52(2):266–275. doi:10.1016/j.ultras.2011.08.011
Ji R, Duan L, Chen J, Xie L, Yao H, Gao W (2013) Learning to distribute vocabulary indexing for scalable visual search. IEEE Trans on Multimedia 15(1):153–166. doi:10.1109/TMM.2012.2225035
Ji R, Duan L, Chen J, Yao H, Yuan J, Rui Y, Gao W (2012) Location discriminative vocabulary coding for mobile landmark search. Int J Comput Vis 96(3):290–314. doi:10.1007/s11263-011-0472-9
Ji R, Gao Y, Hong R, Liu Q, Tao D, Li X (2014) Spectral-spatial constraint hyperspectral image classification. IEEE Trans Geosci Remote Sens 52(3):1811–1824. doi:10.1109/TGRS.2013.2255297
Ji R, Yao H, Liu W, Sun X, Tian Q (2012) Task-dependent visual-codebook compression. IEEE Trans Image Process 21(4):2282–2293. doi:10.1109/TIP.2011.2176950
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2006. IEEE, New York, NY, USA, pp 2169–2178. doi: 10.1109/CVPR.2006.68
Li W, Dong P (2013) Object recognition based on the region of interest and optical bag of words model. In: Proceedings of the Fifth International Conference on Internet Multimedia Computing and Service, ICIMCS 2013. ACM, New York, USA, pp 394–398. doi: 10.1145/2499788.2499873
Li X, Huang Q, Jin L, Wei G, Tao D (2011) Exploiting local coherent patterns for unsupervised feature ranking. IEEE Trans on Syst, Man and Cybern Part B Cybern 41(6):1471–1482. doi:10.1109/TSMCB.2011.2151256
Li Y, Liu W, Li X, Huang Q, Li X (2013) GA-SIFT: A new scale invariant feature transform for multispectral image using geometric algebra. Information Sciences. (In press)
Li F, Pietro P (2005) A bayesian hierarchical model for learning natural scene categories. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2005. IEEE, San Diego, CA, USA, pp 524–531. doi: 10.1109/CVPR.2005.16
Li J, Tao D (2013) Simple exponential family PCA. IEEE Trans on Neural Netw and Learn Syst 24(3):485–497. doi:10.1109/TNNLS.2012.2234134
Li J, Tao D (2013) Exponential family factors for Bayesian factor analysis. IEEE Trans on Neural Netw and Learn Syst 24(6):964–976. doi:10.1109/TNNLS.2013.2245341
Liu W, Tao D (2013) Multiview hessian regularization for image annotation. IEE Trans on Image Process 22(7):2676–2687. doi:10.1109/TIP.2013.2255302
Lowe DG (1999) Object recognition from local scale-invariant features. In: The Proceedings of the Seventh IEEE International Conference on Computer Vision, ICCV 1999. IEEE, Kerkyra, Greece, pp 1150–1157. doi:10.1109/ICCV.1999.790410
Lowe DG (2004) Distinctive image features from scale-invariant Key points. Int J Comput Vis 60(2):91–110. doi:10.1023/B:VISI.0000029664.99615.94
Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2008) Lost in quantization: Improving particular object retrieval in large scale image databases. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2008. IEEE, Anchorage, AK, USA, pp 1–8. doi : 10.1109/CVPR.2008.4587635
Tao D, Jin L (2012) Discriminative information preservation for face recognition. Neurocomputing 91:11–20. doi:10.1016/j.neucom.2012.02.024
Tao D, Li X, Wu X, Maybank SJ (2007) General tensor discriminant analysis and Gabor features for gait recognition. IEEE Trans on Pattern Anal and Mach Intel 29(10):1700–1715. doi:10.1109/TPAMI.2007.1096
Tao D, Liang L, Jin L, Gao Y (2011) Similar Handwritten Chinese Character Recognition Using Discriminative Locality Alignment Manifold Learning. In: International Conference on Document Analysis and Recognition, ICDAR 2011. IEEE, Beijing, China, pp 1012–1016. doi:10.1109/ICDAR.2011.205
van Gemert J C, Geusebroek J M, Veenman C J, Smeulders AWM (2008) Kernel codebooks for scene categorization. In: 10th European Conference on Computer Vision, ECCV 2008. Springer, Marseille, France, pp 696–709. doi:10.1007/978-3-540-88690-7_52
van Gemert JC, Veenman CJ, Smeulders AW, Geusebroek JM (2010) Visual word ambiguity. IEEE Trans on Pattern Anal and Mach Intel 32(7):1271–1283. doi:10.1109/TPAMI.2009.132
Wang J, Yang J, Yu K, Lv F, Huang T, Gong Y (2010) Locality-constrained linear coding for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2010. IEEE, San Francisco, CA, pp 3360–3367. doi:10.1109/CVPR.2010.5540018
Wu L, Hoi SCH (2011) Enhancing Bag-of-words models with semantics-preserving metric learning. IEEE Multimedia 18(1):24–37. doi:10.1109/MMUL.2011.7
Wu L, Hoi SCH, Yu N (2010) Semantics-preserving bag-of-words models and applications. IEEE Trans Image Process 19(7):1908–1920. doi:10.1109/TIP.2010.2045169
Wu Z, Ke Q, Sun J, Shum H (2009) A multi-sample, multi-tree approach to bag-of-words image representation for image retrieval. In: IEEE 12th International Conference on Computer Vision. ICCV 2009, pp 1992–1999. doi:10.1109/ICCV.2009.5459439
Zadeh LA (1965) Fuzzy sets. Inf Control 8(3):338–353
Zha Z, Zhang H, Wang M, Luan H, Chua TS(2013) Detecting Group Activities with Multi-Camera Context. IEEE Transactions on Circuits and Systems for Video Technologies 23(5):856–869. doi: 10.1109/TCSVT.2012.2226526
Zha Z, Wang M, Zheng Y, Yang Y, Hong R, Chua TS (2012) Interactive Video Indexing With Statistical Active Learning. IEEE Transactions on Multimedia 14(1): 17–27. doi: 10.1109/TMM.2011.2174782
Zha Z, Yang L, Mei T, Wang M, Wang Z, Chua TS, Hua X (2010) Visual query suggestion: Towards Capturing User Intent in Internet Image Search. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMMCAP) 6(3), Article No. 13doi: 10.1145/1823746.1823747
Zhang Y, Jin R, Zhou Z (2010) Understanding bag-of-words model: a statistical framework. Int J Mach Learn Cybern 1(1–4):43–52. doi:10.1007/s13042-010-0001-0
Zheng S, Huang Q, Jin L, Wei G (2011) Real-time extended-field-of-view ultrasound based on a standard PC. Appl Acoust 73(4):423–432. doi:10.1016/j.apacoust.2011.09.013
Acknowledgments
The research was supported by National Natural Science Funds of China (Nos. 61125106, 61372007, 91120302, and 61072093), Guangdong Provincial Project of Transportation Science and Technology (No. 2012-02-084), Natural Science Funds of Guangdong Province (No. S2012010009885), the Fundamental Research Funds for the Central Universities (No. 2014ZG0038), Projects of innovative science and technology, Department of Education, Guangdong Province (No. 2013KJCX0012), and Shaanxi Key Innovation Team of Science and Technology (Grant No.: 2012KCT-04).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Li, Y., Liu, W., Huang, Q. et al. Fuzzy bag of words for social image description. Multimed Tools Appl 75, 1371–1390 (2016). https://doi.org/10.1007/s11042-014-2138-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-014-2138-4