A multimodal query expansion based on genetic programming for visually-oriented e-commerce applications

doi:10.1016/j.ipm.2016.03.001

Information Processing & Management

Volume 52, Issue 5, September 2016, Pages 783-800

https://doi.org/10.1016/j.ipm.2016.03.001 Get rights and content

Highlights

•
An application of Genetic Programming where the results are better than previous results found in literature.
•
A comparison of a Genetic Programming solution with two other learning-to-rank techniques: RankSVM and Random Forests.
•
A multimodal expansion uses the initial query to infer other attributes that are relevant to the query.
•
A new approach for visually-oriented e-commerce applications.
•
A solution useful when the user is searching for products such as clothing, shoes, handbags, watches, and accessories.
•
A solution that allows finding relevant products related to an image query even though their image representation is not similar to the query.

Abstract

We present a novel multimodal query expansion strategy, based on genetic programming (GP), for image search in visually-oriented e-commerce applications. Our GP-based approach aims at both: learning to expand queries with multimodal information and learning to compute the “best” ranking for the expanded queries. However, different from previous work, the query is only expressed in terms of the visual content, which brings several challenges for this type of application. In order to evaluate the effectiveness of our method, we have collected two datasets containing images of clothing products taken from different online shops. Experimental results indicate that our method is an effective alternative for improving the quality of image search results when compared to a genetic programming system based only on visual information. Our method can achieve gains varying from 10.8% against the strongest learning-to-rank baseline to 54% against an adhoc specialized solution for the particular domain at hand.

Introduction

Recent technological advances have contributed to the creation of new opportunities for content-based image retrieval (CBIR) applications. Such is the case for systems that search for products in online stores, popularized by the wide-spread use of mobile devices, such as tablets and smartphones that can access applications using remote resources, as well as by modern and secure payment mechanisms that allow users to buy goods using their mobile devices.

Contributing to this scenario, we can see an increasing number of e-commerce companies, which demand constant improvements in search technologies to support their operations. Additional evidence that this specific problem is increasing in importance can be found in recent reports summarizing e-commerce activities. These reports show a fast growing market share of segments such as fashion, apparel and accessories along with furniture and home furnishings, where visual search can play an important role. For instance, these last two segments combined represent approximately 25% of e-commerce sales in the US in 2012, according to the eMarketer report (eMarketer, 2012). Garment, including clothing, and accessories, is the second biggest category in e-commerce shopping in the US and is growing in importance in most of the largest e-commerce markets worldwide.

In this context, we propose a new machine-learning CBIR approach for visually-oriented e-commerce applications in which image search is a key component. In this specific domain, we consider that the system does not have access to textual information from the user. A common scenario occurs, for instance, when a user takes a photo of a product in a store with his/her cell phone and wants to find similar items to the one she found at the store. This type of query is useful when the user is searching for products such as clothing, shoes, handbags, watches and accessories. The visual presentation of this sort of product is essential for the consumer’s purchase decision. Considering that there are usually several visually similar products in the product database, visual search can help in the specific task of finding the desired product and improving the user experience. As one example of our target application, we mention popular apps currently available such as ebay fashion,¹ look4color² in the US, and netshoes click³ in Brazil. In these applications, users search for products by giving as input photos taken from other products. Such applications are growing in importance since mobile devices with online access to apps are increasingly available.

One of the key properties that makes our target application different from traditional image search applications is the final goal for the search task. The query in our target application is an image that represents a product, but what makes a product relevant, or not, to the user may be coded in other attributes not present in its image. Sometimes products in the database may be relevant to the user, even though their image representation is not so close to the image provided in the query. Examples of this problem include differences in the approach to produce the product photos, such as folded versus unfolded t-shirts. Another example occurs when products with distinct colors and textures may also be considered relevant in specific cases, given their styles and so on.⁴

Our proposed solution allows finding relevant products related to an image query even though their image representation is not (very) similar to the query. Our solution to the problem is to perform a multimodal expansion, using the initial query to infer other attributes that are relevant to the query, such as product’s category and textual description. The key idea is to use the visual information to produce an initial ranking and then extract accurate multimodal information from the results that may be used as an expansion to the initial query.

Our strategy exploits a self-reranking machine learning approach based on automatic multimodal query expansion. The challenge here is that, different from most previous work, we do not assume that the query is defined in terms of both image visual content and textual descriptions. In fact, the constraints usually imposed by our target application imply that only the visual information is available, i.e., a photo of the desired product. The key idea of our method is to expand the initial image query with information about the inferred category of the query image along with textual content automatically associated with other visually similar images. This type of information is usually largely available in online product catalogs, helping us in the goal of improving the quality of the overall retrieval process. Thus, we expand the initial query image with multimodal information, producing a new ranking based on the expanded query.

The expansion and actual ranking of images exploit a Genetic Programming (GP) approach to perform both: the expansion of the initial query and the computation of a new ranking based on it. We propose and experiment four alternatives to using GP for deriving multimodal query expansion methods. GP is used to find the best possible multimodal combination from the available pieces of evidence. We chose GP for a number of reasons, including: (i) excellent effectiveness in previous CBIR studies (Andrade, Almeida, Pedrini, da S. Torres, 2012, Calumby, da S. Torres, Gonçalves, 2014, Faria, Veloso, Almeida, Valle, da S. Torres, Gonçalves, Jr., 2010, Ferreira, dos Santos, da S. Torres, Gonçalves, Rezende, Fan, 2011, Torres, Falcão, Gonçalves, Papa, Zhang, Fan, Fox, 2009), mainly when exploiting multimodal information; (ii) capability to find near-optimal solutions in large search spaces, as is the case here; (iii) and capability to deal with multiple objectives at the same time (in our case query expansion and effective ranking). After learning a ranking function in an offline process with GP, we apply the function at query processing time without any extra overhead. As far as we know, GP has never been applied in the scenario we deal with in this paper, i.e., only the visual information is initially available for multimodal query expansion. The challenge of exploiting only visual aspects comes from the difficulty of mapping low-level features obtained by means of image processing algorithms to high-level concepts found in images, the well-known semantic gap problem (Liu, Zhang, Lu, & Ma, 2007).

Experimental results indicate that our new GP multimodal query expansion approach is able to significantly improve the overall quality of results of e-commerce visual search applications when compared to the application of GP without expansion. This demonstrates that the idea of performing a multimodal machine learning based automatic expansion for image queries is very promising.

In a previous work (dos Santos, Cavalcanti, Saraiva, & de Moura, 2013), we have addressed a similar problem, i.e., searching products using only image queries.⁵ At that opportunity, however, no learning paradigm was explored. Machine learning solutions are both: (i) more principled, with a lot of theoretical background, and (ii) more flexible, able to easily accommodate other types of multimodal evidence when available. In addition, our experimental results show that the GP-based solution outperforms our previous efforts by up to 54% when considering Mean Average Precision results.

On the other hand, a comparison with alternative learning-to-rank (L2R) techniques applied to the same problem, using the same experimental design, reveals the superiority of GP over these L2R alternatives, by producing considerable gains over the best baselines (up to 18%) with no losses in all envisioned scenarios.

This work contributes to the problem of image search with an alternative usage of Genetic Programming as a multimodal query expansion method and its study in a new and emerging practical application. While at first glance it is not obvious that expanding images with multimodal information without any user feedback would result in improvements, we show that for our target application this strategy is effective, mainly when coupled with flexible and effective soft computing strategies, and can be ultimately used in practical applications. Our results also reveal opportunities for applying the same strategy in other emerging areas and for applying other soft computing methods to our target application.

This paper is organized as follows. In next section, we describe related work regarding image search and product image search, including automatic expansion techniques previously proposed in the literature. Section 3 presents the visual features and the image datasets we used in our work. Section 4 presents an overview of the genetic programming approach and discusses the method we propose for expanding and re-ranking image queries, presenting our contribution to show how GP can work simultaneously to perform a multimodal expansion and derivate good multimodal ranking functions. Section 5 presents the experiments we have performed to assert the impact of the proposed method. Finally, Section 6 concludes the article and introduce possible future work directions.

Section snippets

Related work

Image retrieval has been extensively studied in recent decades. Kherfi, Ziou, and Bernardi (2004) provide a comprehensive survey on Web image retrieval systems, giving details on the main issues that have to be addressed during their implementation (e.g., how to perform data gathering, visual feature extraction, indexing, retrieving, and performance evaluation). Below we discuss visual search research work involving one or more of the following techniques: learning-to-rank, genetic programming,

Motivational study: visual search in visually-oriented E-commerce

We start with a small set of motivational investigations as the impact of applying of CBIR techniques in visually-oriented e-commerce applications, such as those found in the fashion domain, can be potentially very high. This e-commerce market is rapidly growing in the world and the access to them by using mobile devices with high quality image capturing features is quickly becoming popular. In the following, we investigate the potential use of traditional CBIR techniques for the envisioned

Genetic programming for query expansion

Genetic Programming (GP) is an evolutionary methodology introduced by Koza (1992). It is a problem-solving technique based on the principles of biological inheritance and evolution of individuals in a population. The search space of a problem, i.e., the space of all possible solutions to the problem, is searched by applying a set of operations that follow the theory of evolution, combining natural selection and genetic operations, to create more diverse and better performing individuals in

Experiments

In this section, we report the experiments performed to validate our multimodal learning-to-rank approach. As already mentioned, in our experiments, we have adopted two metrics to evaluate the methods in the experiments: P@10 and MAP. We applied Student’s t-test to check whether differences in the results obtained by each method are statistically significant, considering p < 0.05. Thus whenever we mention statistically significant differences in the experiments when comparing two methods, we

Conclusions and future work

We proposed to automatically expanding visual queries in CBIR systems using evolutionary approaches to improve the quality of results when multimodal information related to the images is available. We implemented and experimented this idea in the scenario of searching for fashion and accessories in e-commerce Web sites. This is an application that is becoming very relevant, given the growth in sales of this sort of products and the popularization of mobile devices capable of capturing images.

Acknowledgments

Authors thank the following funding agencies for their financial support: CAPES, CNPq (grant nos. 306580/2012-8, 308130/2014-6 and InWeb 573871/2008-6), FAPESP, FAPEAM (FIXAM and E-Vox Project) and FAPEMIG.

References (51)

Arevalillo-HerráezM. et al.
Improving distance based image retrieval using non-dominated sorting genetic algorithm
Pattern Recognition Letters
(2015)
FerreiraC.D. et al.
Relevance feedback based on genetic programming for image retrieval
Pattern Recognition Letters
(2011)
LiuY. et al.
A survey of content-based image retrieval with high-level semantics
Pattern Recognition
(2007)
da SilvaA.T. et al.
Active learning paradigms for CBIR systems based on optimum-path forest classification
Pattern Recognition
(2011)
TorresR. et al.
A genetic programming framework for content-based image retrieval
Pattern Recognition
(2009)
Abdel-HakimA. et al.
Csift: a sift descriptor with color invariant characteristics
IEEE CVPR
(2006)
AndradeF.S.P. et al.
Fusion of local and global descriptors for content-based image and video retrieval
Iberoamerican congress on pattern recognition
(2012)
ArampatzisA. et al.
Dynamic two-stage image retrieval from large multimodal databases
Advances in Information Retrieval
(2011)
Baeza-YatesR.A. et al.
Modern information retrieval - the concepts and technology behind search
(2011)
BoschA. et al.
Representing shape with a spatial pyramid kernel
ACM CIVR
(2007)

BoxG.E.P. et al.

Statistics for experimenters: an introduction to design, data analysis, and model building

(1978)

CalumbyR. et al.

Multimodal retrieval with relevance feedback based on genetic programming

Multimedia Tools and Applications

(2014)

CanutoS.D. et al.

A comparative study of learning-to-rank techniques for tag recommendation

Journal of Information and Data Management

(2013)

ChatzichristofisS. et al.

Cedd: color and edge directivity descriptor: a compact descriptor for image indexing and retrieval

ICVS

(2008)

ChatzichristofisS. et al.

Fcth: Fuzzy color and texture histogram-a low level feature for accurate image retrieval

WIAMIS

(2008)

ChatzichristofisS. et al.

Selection of the proper compact composite descriptor for improving content based image retrieval

Proceedings of the 6th IASTED international conference

(2009)

ChenY. et al.

ilike: integrating visual and textual features for vertical search

ACM MM

(2010)

ChumO. et al.

Total recall: automatic query expansion with a generative feature model for object retrieval

ICCV

(2007)

ClinchantS. et al.

Semantic combination of textual and visual information in multimedia retrieval

ACM ICMR

(2011)

da Costa CarvalhoA.L. et al.

Lepref: learn to precompute evidence fusion for efficient query evaluation

Journal of the Association for Information Science and Technology

(2012)

CuiJ. et al.

Real time google and live image search re-ranking

ACM MM

(2008)

eMarketer (2012). Apparel Drives US Retail Ecommerce Sales Growth....

FariaF.F. et al.

Learning to rank for content-based image retrieval

ACM MIR

(2010)

FeldtR. et al.

Using factorial experiments to evaluate the effect of genetic programming parameters

EuroGP

(2000)

FreundY. et al.

An efficient boosting algorithm for combining preferences

The Journal of Machine Learning Research

(2003)

Cited by (13)

Sequential analysis and clustering to investigate users’ online shopping behaviors based on need-states
2020, Information Processing and Management
Citation Excerpt :
Thus, we suggest that e-sellers and platform providers provide high-quality recommendations with tailored information to enhance the quality of personalization to stimulate them to make purchases. Saraiva et al. (2016) propose a genetic programming-based approach for visually-oriented applications in which image search is a key component. They adopt two image datasets to evaluate the effectiveness of their proposed approach.
With the fast growth of e-commerce and the emerging new retail trend—online and offline integration—it is important to recognize the target market and satisfy customers with different needs by analyzing their online search behaviors. Accordingly, we propose sequential search pattern analysis and clustering to analyze consumers’ search behavior throughout the entire shopping process from the perspective of consumer need-states. We seek to understand how recommendation functions (RFs) or popular non-RF web features help consumers to shop online from a need-state perspective. We adopt maximal repeat patterns (MRPs) and lag sequential analysis (LSA) to analyze the sequence of search paths and identify significant repeated search patterns. Furthermore, to investigate the behaviors of customers with different types of need-states, we analyze webpages related to RFs and non-RF features using clustering to connect the evaluation results of search patterns with page traversal behaviors. This yields four groups of consumers who browse for information, adopt recommendations, consult reviews, and conduct searches with different levels of goal-oriented or exploratory-based need-states. The results show that consumers with strong goal-oriented need-states have the simplest search paths compared to other groups, whereas exploratory-based consumers have the most complicated search paths. Furthermore, consumers with higher need-states tend to search directly, consult reviews carefully, and have stored sequential search patterns, whereas consumers with exploratory-based need-states tend to explore the categories of products and adopt product classification hierarchy as a pivot to explore web features and then adopt specific types of RFs. Interestingly, consumers in the review-consulting group all belong to the goal-oriented need-states type with strong knowledge-building behaviors compared to others. The results reveal that each group employs its own particular web features to facilitate the shopping process and we can identify consumer types based on shopping behavior in the early stage of shopping. This suggests that e-store sellers can refine web features and deploy marketing strategies tailored to the search patterns for different levels of need-states.
Pre-train, Interact, Fine-tune: a novel interaction representation for text classification
2020, Information Processing and Management
Citation Excerpt :
For instance, Zheng, Cai, Chen, Feng, and Chen (2019) took the hierarchical structure of text into account. He, Lee, Ng, and Dahlmeier (2018) and Saraiva, Cavalcanti, de Moura, Gonçalves, and da Silva Torres (2016) transformed the document-level knowledge to improve the performance of aspect-level sentiment classification. Although these approaches have been proven to be effective in the downstream applications, they completely depend on the structure of network to implicitly represent a document, ignoring the interaction that exists among the source elements in a document, e.g., words or sentences.
Text representation can aid machines in text understanding. Previous work on text representation often focuses on the so-called forward implication, i.e., preceding words are taken as the context of later words for creating representations, effective it is, yet ignoring the fact that the semantics of a text segment is a product of the mutual implication of words in the text: later words contribute to the meaning of preceding words. To bridge this gap, we introduce the concept of interaction and propose a two-perspective interaction representation, in which it encapsulates a local and a global interaction representation. Here, a local interaction representation is one that interacts among words with parent-children relationships on the syntactic trees whereas a global interaction interpretation is one that interacts among all the words in a sentence. We combine these two interaction representations to develop a Hybrid Interaction Representation (HIR).
Inspired by existing feature-based and fine-tuning-based pretrain-finetuning approaches to language models, we integrate the merits of feature-based and fine-tuning-based methods to propose the Pre-train, Interact, Fine-tune (PIF) architecture.
We evaluate our proposed models on five widely-used datasets for text classification tasks. It turns out that our ensemble method, HIR_P, outperforms state-of-the-art baselines with improvements ranging from 2.03% to 3.15% in terms of error rate. In addition, we find that, the improvements of PIF against most state-of-the-art methods is not affected by increasing of the text length.
Asking Multimodal Clarifying Questions in Mixed-Initiative Conversational Search
2024, arXiv
A hybrid evolutionary algorithm based automatic query expansion for enhancing document retrieval system
2024, Journal of Ambient Intelligence and Humanized Computing
An automation query expansion strategy for information retrieval by using fuzzy based grasshopper optimization algorithm on medical datasets
2023, Concurrency and Computation: Practice and Experience
Syntharch: Interactive image search with attribute-conditioned synthesis
2020, IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops

View all citing articles on Scopus

View full text

A multimodal query expansion based on genetic programming for visually-oriented e-commerce applications

Highlights

Abstract

Introduction

Section snippets

Related work

Motivational study: visual search in visually-oriented E-commerce

Genetic programming for query expansion

Experiments

Conclusions and future work

Acknowledgments

Pattern Recognition Letters

Pattern Recognition Letters

Pattern Recognition

Pattern Recognition

Pattern Recognition

Csift: a sift descriptor with color invariant characteristics

IEEE CVPR

Fusion of local and global descriptors for content-based image and video retrieval

Iberoamerican congress on pattern recognition

Dynamic two-stage image retrieval from large multimodal databases

Advances in Information Retrieval

Modern information retrieval - the concepts and technology behind search

Representing shape with a spatial pyramid kernel

ACM CIVR

Statistics for experimenters: an introduction to design, data analysis, and model building

Multimodal retrieval with relevance feedback based on genetic programming

Multimedia Tools and Applications

A comparative study of learning-to-rank techniques for tag recommendation

Journal of Information and Data Management

Cedd: color and edge directivity descriptor: a compact descriptor for image indexing and retrieval

ICVS

Fcth: Fuzzy color and texture histogram-a low level feature for accurate image retrieval

WIAMIS

Selection of the proper compact composite descriptor for improving content based image retrieval

Proceedings of the 6th IASTED international conference

ilike: integrating visual and textual features for vertical search

ACM MM

Total recall: automatic query expansion with a generative feature model for object retrieval

ICCV

Semantic combination of textual and visual information in multimedia retrieval

ACM ICMR

Lepref: learn to precompute evidence fusion for efficient query evaluation

Journal of the Association for Information Science and Technology

Real time google and live image search re-ranking

ACM MM

Learning to rank for content-based image retrieval

ACM MIR

Using factorial experiments to evaluate the effect of genetic programming parameters

EuroGP

An efficient boosting algorithm for combining preferences

The Journal of Machine Learning Research