Elsevier

Knowledge-Based Systems

Volume 52, November 2013, Pages 201-213
Knowledge-Based Systems

Care more about customers: Unsupervised domain-independent aspect detection for sentiment analysis of customer reviews

https://doi.org/10.1016/j.knosys.2013.08.011Get rights and content

Highlights

  • We proposed a model used for detecting explicit and implicit aspects in sentiment analysis.

  • The model performs the detection with finding single and multi-word aspects, filtering by A-score metric and pruning.

  • Experimental results show considerable improvements of the proposed model over conventional techniques.

Abstract

With the rapid growth of user-generated content on the internet, automatic sentiment analysis of online customer reviews has become a hot research topic recently, but due to variety and wide range of products and services being reviewed on the internet, the supervised and domain-specific models are often not practical. As the number of reviews expands, it is essential to develop an efficient sentiment analysis model that is capable of extracting product aspects and determining the sentiments for these aspects. In this paper, we propose a novel unsupervised and domain-independent model for detecting explicit and implicit aspects in reviews for sentiment analysis. In the model, first a generalized method is proposed to learn multi-word aspects and then a set of heuristic rules is employed to take into account the influence of an opinion word on detecting the aspect. Second a new metric based on mutual information and aspect frequency is proposed to score aspects with a new bootstrapping iterative algorithm. The presented bootstrapping algorithm works with an unsupervised seed set. Third, two pruning methods based on the relations between aspects in reviews are presented to remove incorrect aspects. Finally the model employs an approach which uses explicit aspects and opinion words to identify implicit aspects. Utilizing extracted polarity lexicon, the approach maps each opinion word in the lexicon to the set of pre-extracted explicit aspects with a co-occurrence metric. The proposed model was evaluated on a collection of English product review datasets. The model does not require any labeled training data and it can be easily applied to other languages or other domains such as movie reviews. Experimental results show considerable improvements of our model over conventional techniques including unsupervised and supervised approaches.

Introduction

With the rapid growth of user-generated content on the internet, the number of customer reviews that a product or service receives grows rapidly. A significant number of websites, blogs and forums (e.g., www.amazon.com, rottentomatoes.com, epinions.com) allow customers to post opinions about a variety of products or services. This online word of mouth behavior introduces a new and important source of information for business intelligence and marketing. In the other words customer reviews are essential to other potential customers, retailers and product manufacturers (potential users) in their efforts to understand the general opinions of customers and help them to make better decisions. As the number of customer reviews expands, it becomes very hard for users to obtain a comprehensive view of opinions of previous customers about various aspects of products through a manual analysis. Consequently proper analysis and summarization of customer reviews can further enable potential users to visualize previous positive and negative opinions about specific features or aspects of products. Therefore it is highly desirable to produce an automatic analysis or summary of customer reviews.

For the past few years, sentiment analysis (or opinion mining) for online customer reviews has attracted a great deal of attentions from researchers of data mining and natural language processing [1], [3], [5], [7], [8], [11], [9], [24], [25], [27], [33].

Sentiment analysis is a type of text analysis under the broad area of text mining and computational intelligence. Three fundamental problems in sentiment analysis are: aspect detection, opinion word detection and sentiment orientation identification [24], [27], [33].

Aspects are topics on which opinions are expressed. In the field of sentiment analysis, other names for aspect are: features, product features or opinion targets [3], [5], [7], [8], [6], [12], [24], [27], [33]. Aspects are important because without knowing them, the opinions expressed in a sentence or a review are of limited use. For example, in the review sentence “after using iPod, I found the size to be perfect for carrying in a pocket”, “size” is the aspect for which an opinion is expressed. Likewise aspect detection is critical to sentiment analysis, because its effectiveness dramatically affects the performance of opinion word detection and sentiment orientation identification. Therefore, in this study we concentrate on aspect detection for sentiment analysis.

Existing aspect detection methods can broadly be classified into two major approaches: supervised and unsupervised. Supervised aspect detection approaches require a set of pre-labeled training data. Although the supervised approaches can achieve reasonable effectiveness, building sufficient labeled data is often expensive and needs much human labor. Since unlabeled data are generally publicly available, it is desirable to develop models that work with unlabeled data. Additionally, due to variety and wide range of products and services being reviewed on the internet, supervised, domain-specific or language-dependent models are often hard to apply. Therefore we conclude the framework for the aspect detection must be robust and easily transferable between domains or languages.

In this paper, we present a novel unsupervised model which addresses the core tasks necessary to detect explicit and implicit aspects from review sentences in a sentiment analysis system. Our model differs from existing techniques in that it requires no labeled training data or additional information, not even for the initial seed information. Therefore the model can easily be transferred between domains or languages. The proposed model is based on the observation that there is inter-relation information between the aspects in reviews. Inter-relation information is the probability of the co-occurrence of two aspects in a review. Therefore the model explores review dataset by using both frequency-based and inter-relation information to find the aspects. Furthermore we have found that opinion words and aspects themselves have relations in opinionated sentences. Finally the model uses explicit extracted aspects and opinion words to detect implicit aspects.

In the remainder of this paper, Section 2 gives a definition of the aspect-level sentiment analysis, detailed discussions of existing works on aspect detection will be given in Section 3. Section 4 describes the proposed aspect detection model for sentiment analysis, including the overall process and specific aspects of the design of the workflow. Subsequently we describe our empirical evaluation and discuss the major experimental results in Section 5. Finally we conclude with a summary and some future research directions in Section 6.

Section snippets

Aspect-level sentiment analysis

Opinions can be expressed about anything, e.g., a topic, a product, a service, an individual, an event, an organization or any attributes of them. Hence we use the notation of aspect to denote the target object that has been evaluated. An opinion (as expressed by means of opinion words) is a positive or negative sentiment, attitude, emotion or appraisal about an aspect. Positive and negative are called sentiment or opinion orientations [10], [6]. In general there are two types of reviews:

Related works

Several methods have been proposed, mainly in the context of product review mining in a broad range of study fields, from document to aspect level sentiment analysis for standard, ironic or spam reviews [3], [7], [8], [6], [12], [16], [21], [27], [33], [18], [19]. In the review mining task, aspects usually refer to opinion targets and product features, which are defined as product components or attributes. Existing aspect and product feature extraction techniques use both supervised and

Aspect detection model for sentiment analysis

Fig. 3 gives the architectural overview of the proposed model used for detecting explicit and implicit aspects in sentiment analysis. The basic hypotheses in this model are about using frequency-based and inter-relation information of the aspects together, employing the influence of an opinion word in the review sentence and giving more importance to multi-word aspects. This model proves using these hypotheses all together attain to highly effective results for product aspect extraction.

The

Experimental results

In this section we discuss the experimental results for the proposed model and presented algorithms. To report the effectiveness of our model first we evaluate the results for each individual step in of our model, and then we compare the results with the benchmarked results by Wei et al. [27] and Somprasertsri and Lalitrojwong’s [21]. Finally we discuss about identification of implicit aspects. In the following, data collection, evaluation measures and important evaluation results will be

Conclusions

In this research we study sentiment analysis and opinion mining for online reviews. When dealing with mining online reviews, it is often expensive and time consuming to construct labeled data for training purposes and it is desirable to develop a model or algorithm that can do without labeled data. In this paper we therefore proposed an unsupervised domain- and language-independent model for detecting explicit and implicit aspects from the reviews. The proposed model is able to deal with three

Acknowledgments

The authors thank Dr. Djoerd Hiemstra for his invaluable comments and suggestions and gratefully acknowledge the hospitality offered to the first author by the Human Media Interaction (HMI) group at the University of Twente. The research of the last author of this paper is partially supported by the Dutch National FES Program COMMIT.

References (35)

  • B. Liu et al.

    Opinion observer: analyzing and comparing opinions on the web

  • B. Liu et al.

    A survey of opinion mining and sentiment analysis

    Mining Text Data

    (2012)
  • C. Lin et al.

    Weakly supervised joint sentiment-topic detection from text

    IEEE Transactions on Knowledge and Data Engineering

    (2012)
  • G. Mangnoesing, A. van Bunningen, A. Hogenboom, F. Hogenboom, F. Frasincar, An empirical study for determining relevant...
  • M.P. Marcus et al.

    Building a large annotated corpus of English: the Penn Treebank

    Computational Linguistics

    (1993)
  • S. Moghaddam et al.

    ILDA: interdependent LDA model for learning latent aspects and their ratings from online product reviews

  • H. Nakagawa et al.

    Automatic term recognition based on statistics of compound nouns and their components

    Terminology

    (2003)
  • Cited by (119)

    • Cross-Domain Aspect Detection and Categorization using Machine Learning for Aspect-based Opinion Mining

      2022, International Journal of Information Management Data Insights
      Citation Excerpt :

      They used lexicon for finding sentiment polarity. Bagheri, Mohamad and Franciska (Bagheri, Mohamad and Franciska 2013) proposed Aspect Detection Model based on LDA (ADMLDA). This model is based on Markov Chain and doesn't consider bag of words.

    • Intelligent product redesign strategy with ontology-based fine-grained sentiment analysis

      2021, Artificial Intelligence for Engineering Design, Analysis and Manufacturing: AIEDAM
    View all citing articles on Scopus
    View full text