A multimodal query expansion based on genetic programming for visually-oriented e-commerce applications
Introduction
Recent technological advances have contributed to the creation of new opportunities for content-based image retrieval (CBIR) applications. Such is the case for systems that search for products in online stores, popularized by the wide-spread use of mobile devices, such as tablets and smartphones that can access applications using remote resources, as well as by modern and secure payment mechanisms that allow users to buy goods using their mobile devices.
Contributing to this scenario, we can see an increasing number of e-commerce companies, which demand constant improvements in search technologies to support their operations. Additional evidence that this specific problem is increasing in importance can be found in recent reports summarizing e-commerce activities. These reports show a fast growing market share of segments such as fashion, apparel and accessories along with furniture and home furnishings, where visual search can play an important role. For instance, these last two segments combined represent approximately 25% of e-commerce sales in the US in 2012, according to the eMarketer report (eMarketer, 2012). Garment, including clothing, and accessories, is the second biggest category in e-commerce shopping in the US and is growing in importance in most of the largest e-commerce markets worldwide.
In this context, we propose a new machine-learning CBIR approach for visually-oriented e-commerce applications in which image search is a key component. In this specific domain, we consider that the system does not have access to textual information from the user. A common scenario occurs, for instance, when a user takes a photo of a product in a store with his/her cell phone and wants to find similar items to the one she found at the store. This type of query is useful when the user is searching for products such as clothing, shoes, handbags, watches and accessories. The visual presentation of this sort of product is essential for the consumer’s purchase decision. Considering that there are usually several visually similar products in the product database, visual search can help in the specific task of finding the desired product and improving the user experience. As one example of our target application, we mention popular apps currently available such as ebay fashion,1 look4color2 in the US, and netshoes click3 in Brazil. In these applications, users search for products by giving as input photos taken from other products. Such applications are growing in importance since mobile devices with online access to apps are increasingly available.
One of the key properties that makes our target application different from traditional image search applications is the final goal for the search task. The query in our target application is an image that represents a product, but what makes a product relevant, or not, to the user may be coded in other attributes not present in its image. Sometimes products in the database may be relevant to the user, even though their image representation is not so close to the image provided in the query. Examples of this problem include differences in the approach to produce the product photos, such as folded versus unfolded t-shirts. Another example occurs when products with distinct colors and textures may also be considered relevant in specific cases, given their styles and so on.4
Our proposed solution allows finding relevant products related to an image query even though their image representation is not (very) similar to the query. Our solution to the problem is to perform a multimodal expansion, using the initial query to infer other attributes that are relevant to the query, such as product’s category and textual description. The key idea is to use the visual information to produce an initial ranking and then extract accurate multimodal information from the results that may be used as an expansion to the initial query.
Our strategy exploits a self-reranking machine learning approach based on automatic multimodal query expansion. The challenge here is that, different from most previous work, we do not assume that the query is defined in terms of both image visual content and textual descriptions. In fact, the constraints usually imposed by our target application imply that only the visual information is available, i.e., a photo of the desired product. The key idea of our method is to expand the initial image query with information about the inferred category of the query image along with textual content automatically associated with other visually similar images. This type of information is usually largely available in online product catalogs, helping us in the goal of improving the quality of the overall retrieval process. Thus, we expand the initial query image with multimodal information, producing a new ranking based on the expanded query.
The expansion and actual ranking of images exploit a Genetic Programming (GP) approach to perform both: the expansion of the initial query and the computation of a new ranking based on it. We propose and experiment four alternatives to using GP for deriving multimodal query expansion methods. GP is used to find the best possible multimodal combination from the available pieces of evidence. We chose GP for a number of reasons, including: (i) excellent effectiveness in previous CBIR studies (Andrade, Almeida, Pedrini, da S. Torres, 2012, Calumby, da S. Torres, Gonçalves, 2014, Faria, Veloso, Almeida, Valle, da S. Torres, Gonçalves, Jr., 2010, Ferreira, dos Santos, da S. Torres, Gonçalves, Rezende, Fan, 2011, Torres, Falcão, Gonçalves, Papa, Zhang, Fan, Fox, 2009), mainly when exploiting multimodal information; (ii) capability to find near-optimal solutions in large search spaces, as is the case here; (iii) and capability to deal with multiple objectives at the same time (in our case query expansion and effective ranking). After learning a ranking function in an offline process with GP, we apply the function at query processing time without any extra overhead. As far as we know, GP has never been applied in the scenario we deal with in this paper, i.e., only the visual information is initially available for multimodal query expansion. The challenge of exploiting only visual aspects comes from the difficulty of mapping low-level features obtained by means of image processing algorithms to high-level concepts found in images, the well-known semantic gap problem (Liu, Zhang, Lu, & Ma, 2007).
Experimental results indicate that our new GP multimodal query expansion approach is able to significantly improve the overall quality of results of e-commerce visual search applications when compared to the application of GP without expansion. This demonstrates that the idea of performing a multimodal machine learning based automatic expansion for image queries is very promising.
In a previous work (dos Santos, Cavalcanti, Saraiva, & de Moura, 2013), we have addressed a similar problem, i.e., searching products using only image queries.5 At that opportunity, however, no learning paradigm was explored. Machine learning solutions are both: (i) more principled, with a lot of theoretical background, and (ii) more flexible, able to easily accommodate other types of multimodal evidence when available. In addition, our experimental results show that the GP-based solution outperforms our previous efforts by up to 54% when considering Mean Average Precision results.
On the other hand, a comparison with alternative learning-to-rank (L2R) techniques applied to the same problem, using the same experimental design, reveals the superiority of GP over these L2R alternatives, by producing considerable gains over the best baselines (up to 18%) with no losses in all envisioned scenarios.
This work contributes to the problem of image search with an alternative usage of Genetic Programming as a multimodal query expansion method and its study in a new and emerging practical application. While at first glance it is not obvious that expanding images with multimodal information without any user feedback would result in improvements, we show that for our target application this strategy is effective, mainly when coupled with flexible and effective soft computing strategies, and can be ultimately used in practical applications. Our results also reveal opportunities for applying the same strategy in other emerging areas and for applying other soft computing methods to our target application.
This paper is organized as follows. In next section, we describe related work regarding image search and product image search, including automatic expansion techniques previously proposed in the literature. Section 3 presents the visual features and the image datasets we used in our work. Section 4 presents an overview of the genetic programming approach and discusses the method we propose for expanding and re-ranking image queries, presenting our contribution to show how GP can work simultaneously to perform a multimodal expansion and derivate good multimodal ranking functions. Section 5 presents the experiments we have performed to assert the impact of the proposed method. Finally, Section 6 concludes the article and introduce possible future work directions.
Section snippets
Related work
Image retrieval has been extensively studied in recent decades. Kherfi, Ziou, and Bernardi (2004) provide a comprehensive survey on Web image retrieval systems, giving details on the main issues that have to be addressed during their implementation (e.g., how to perform data gathering, visual feature extraction, indexing, retrieving, and performance evaluation). Below we discuss visual search research work involving one or more of the following techniques: learning-to-rank, genetic programming,
Motivational study: visual search in visually-oriented E-commerce
We start with a small set of motivational investigations as the impact of applying of CBIR techniques in visually-oriented e-commerce applications, such as those found in the fashion domain, can be potentially very high. This e-commerce market is rapidly growing in the world and the access to them by using mobile devices with high quality image capturing features is quickly becoming popular. In the following, we investigate the potential use of traditional CBIR techniques for the envisioned
Genetic programming for query expansion
Genetic Programming (GP) is an evolutionary methodology introduced by Koza (1992). It is a problem-solving technique based on the principles of biological inheritance and evolution of individuals in a population. The search space of a problem, i.e., the space of all possible solutions to the problem, is searched by applying a set of operations that follow the theory of evolution, combining natural selection and genetic operations, to create more diverse and better performing individuals in
Experiments
In this section, we report the experiments performed to validate our multimodal learning-to-rank approach. As already mentioned, in our experiments, we have adopted two metrics to evaluate the methods in the experiments: P@10 and MAP. We applied Student’s t-test to check whether differences in the results obtained by each method are statistically significant, considering p < 0.05. Thus whenever we mention statistically significant differences in the experiments when comparing two methods, we
Conclusions and future work
We proposed to automatically expanding visual queries in CBIR systems using evolutionary approaches to improve the quality of results when multimodal information related to the images is available. We implemented and experimented this idea in the scenario of searching for fashion and accessories in e-commerce Web sites. This is an application that is becoming very relevant, given the growth in sales of this sort of products and the popularization of mobile devices capable of capturing images.
Acknowledgments
Authors thank the following funding agencies for their financial support: CAPES, CNPq (grant nos. 306580/2012-8, 308130/2014-6 and InWeb 573871/2008-6), FAPESP, FAPEAM (FIXAM and E-Vox Project) and FAPEMIG.
References (51)
- et al.
Improving distance based image retrieval using non-dominated sorting genetic algorithm
Pattern Recognition Letters
(2015) - et al.
Relevance feedback based on genetic programming for image retrieval
Pattern Recognition Letters
(2011) - et al.
A survey of content-based image retrieval with high-level semantics
Pattern Recognition
(2007) - et al.
Active learning paradigms for CBIR systems based on optimum-path forest classification
Pattern Recognition
(2011) - et al.
A genetic programming framework for content-based image retrieval
Pattern Recognition
(2009) - et al.
Csift: a sift descriptor with color invariant characteristics
IEEE CVPR
(2006) - et al.
Fusion of local and global descriptors for content-based image and video retrieval
Iberoamerican congress on pattern recognition
(2012) - et al.
Dynamic two-stage image retrieval from large multimodal databases
Advances in Information Retrieval
(2011) - et al.
Modern information retrieval - the concepts and technology behind search
(2011) - et al.
Representing shape with a spatial pyramid kernel
ACM CIVR
(2007)
Statistics for experimenters: an introduction to design, data analysis, and model building
Multimodal retrieval with relevance feedback based on genetic programming
Multimedia Tools and Applications
A comparative study of learning-to-rank techniques for tag recommendation
Journal of Information and Data Management
Cedd: color and edge directivity descriptor: a compact descriptor for image indexing and retrieval
ICVS
Fcth: Fuzzy color and texture histogram-a low level feature for accurate image retrieval
WIAMIS
Selection of the proper compact composite descriptor for improving content based image retrieval
Proceedings of the 6th IASTED international conference
ilike: integrating visual and textual features for vertical search
ACM MM
Total recall: automatic query expansion with a generative feature model for object retrieval
ICCV
Semantic combination of textual and visual information in multimedia retrieval
ACM ICMR
Lepref: learn to precompute evidence fusion for efficient query evaluation
Journal of the Association for Information Science and Technology
Real time google and live image search re-ranking
ACM MM
Learning to rank for content-based image retrieval
ACM MIR
Using factorial experiments to evaluate the effect of genetic programming parameters
EuroGP
An efficient boosting algorithm for combining preferences
The Journal of Machine Learning Research
Cited by (13)
Sequential analysis and clustering to investigate users’ online shopping behaviors based on need-states
2020, Information Processing and ManagementCitation Excerpt :Thus, we suggest that e-sellers and platform providers provide high-quality recommendations with tailored information to enhance the quality of personalization to stimulate them to make purchases. Saraiva et al. (2016) propose a genetic programming-based approach for visually-oriented applications in which image search is a key component. They adopt two image datasets to evaluate the effectiveness of their proposed approach.
Pre-train, Interact, Fine-tune: a novel interaction representation for text classification
2020, Information Processing and ManagementCitation Excerpt :For instance, Zheng, Cai, Chen, Feng, and Chen (2019) took the hierarchical structure of text into account. He, Lee, Ng, and Dahlmeier (2018) and Saraiva, Cavalcanti, de Moura, Gonçalves, and da Silva Torres (2016) transformed the document-level knowledge to improve the performance of aspect-level sentiment classification. Although these approaches have been proven to be effective in the downstream applications, they completely depend on the structure of network to implicitly represent a document, ignoring the interaction that exists among the source elements in a document, e.g., words or sentences.
A hybrid evolutionary algorithm based automatic query expansion for enhancing document retrieval system
2024, Journal of Ambient Intelligence and Humanized ComputingAn automation query expansion strategy for information retrieval by using fuzzy based grasshopper optimization algorithm on medical datasets
2023, Concurrency and Computation: Practice and ExperienceSyntharch: Interactive image search with attribute-conditioned synthesis
2020, IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops