Care more about customers: Unsupervised domain-independent aspect detection for sentiment analysis of customer reviews

doi:10.1016/j.knosys.2013.08.011

Knowledge-Based Systems

Volume 52, November 2013, Pages 201-213

https://doi.org/10.1016/j.knosys.2013.08.011 Get rights and content

Highlights

•
We proposed a model used for detecting explicit and implicit aspects in sentiment analysis.
•
The model performs the detection with finding single and multi-word aspects, filtering by A-score metric and pruning.
•
Experimental results show considerable improvements of the proposed model over conventional techniques.

Abstract

With the rapid growth of user-generated content on the internet, automatic sentiment analysis of online customer reviews has become a hot research topic recently, but due to variety and wide range of products and services being reviewed on the internet, the supervised and domain-specific models are often not practical. As the number of reviews expands, it is essential to develop an efficient sentiment analysis model that is capable of extracting product aspects and determining the sentiments for these aspects. In this paper, we propose a novel unsupervised and domain-independent model for detecting explicit and implicit aspects in reviews for sentiment analysis. In the model, first a generalized method is proposed to learn multi-word aspects and then a set of heuristic rules is employed to take into account the influence of an opinion word on detecting the aspect. Second a new metric based on mutual information and aspect frequency is proposed to score aspects with a new bootstrapping iterative algorithm. The presented bootstrapping algorithm works with an unsupervised seed set. Third, two pruning methods based on the relations between aspects in reviews are presented to remove incorrect aspects. Finally the model employs an approach which uses explicit aspects and opinion words to identify implicit aspects. Utilizing extracted polarity lexicon, the approach maps each opinion word in the lexicon to the set of pre-extracted explicit aspects with a co-occurrence metric. The proposed model was evaluated on a collection of English product review datasets. The model does not require any labeled training data and it can be easily applied to other languages or other domains such as movie reviews. Experimental results show considerable improvements of our model over conventional techniques including unsupervised and supervised approaches.

Introduction

With the rapid growth of user-generated content on the internet, the number of customer reviews that a product or service receives grows rapidly. A significant number of websites, blogs and forums (e.g., www.amazon.com, rottentomatoes.com, epinions.com) allow customers to post opinions about a variety of products or services. This online word of mouth behavior introduces a new and important source of information for business intelligence and marketing. In the other words customer reviews are essential to other potential customers, retailers and product manufacturers (potential users) in their efforts to understand the general opinions of customers and help them to make better decisions. As the number of customer reviews expands, it becomes very hard for users to obtain a comprehensive view of opinions of previous customers about various aspects of products through a manual analysis. Consequently proper analysis and summarization of customer reviews can further enable potential users to visualize previous positive and negative opinions about specific features or aspects of products. Therefore it is highly desirable to produce an automatic analysis or summary of customer reviews.

For the past few years, sentiment analysis (or opinion mining) for online customer reviews has attracted a great deal of attentions from researchers of data mining and natural language processing [1], [3], [5], [7], [8], [11], [9], [24], [25], [27], [33].

Sentiment analysis is a type of text analysis under the broad area of text mining and computational intelligence. Three fundamental problems in sentiment analysis are: aspect detection, opinion word detection and sentiment orientation identification [24], [27], [33].

Aspects are topics on which opinions are expressed. In the field of sentiment analysis, other names for aspect are: features, product features or opinion targets [3], [5], [7], [8], [6], [12], [24], [27], [33]. Aspects are important because without knowing them, the opinions expressed in a sentence or a review are of limited use. For example, in the review sentence “after using iPod, I found the size to be perfect for carrying in a pocket”, “size” is the aspect for which an opinion is expressed. Likewise aspect detection is critical to sentiment analysis, because its effectiveness dramatically affects the performance of opinion word detection and sentiment orientation identification. Therefore, in this study we concentrate on aspect detection for sentiment analysis.

Existing aspect detection methods can broadly be classified into two major approaches: supervised and unsupervised. Supervised aspect detection approaches require a set of pre-labeled training data. Although the supervised approaches can achieve reasonable effectiveness, building sufficient labeled data is often expensive and needs much human labor. Since unlabeled data are generally publicly available, it is desirable to develop models that work with unlabeled data. Additionally, due to variety and wide range of products and services being reviewed on the internet, supervised, domain-specific or language-dependent models are often hard to apply. Therefore we conclude the framework for the aspect detection must be robust and easily transferable between domains or languages.

In this paper, we present a novel unsupervised model which addresses the core tasks necessary to detect explicit and implicit aspects from review sentences in a sentiment analysis system. Our model differs from existing techniques in that it requires no labeled training data or additional information, not even for the initial seed information. Therefore the model can easily be transferred between domains or languages. The proposed model is based on the observation that there is inter-relation information between the aspects in reviews. Inter-relation information is the probability of the co-occurrence of two aspects in a review. Therefore the model explores review dataset by using both frequency-based and inter-relation information to find the aspects. Furthermore we have found that opinion words and aspects themselves have relations in opinionated sentences. Finally the model uses explicit extracted aspects and opinion words to detect implicit aspects.

In the remainder of this paper, Section 2 gives a definition of the aspect-level sentiment analysis, detailed discussions of existing works on aspect detection will be given in Section 3. Section 4 describes the proposed aspect detection model for sentiment analysis, including the overall process and specific aspects of the design of the workflow. Subsequently we describe our empirical evaluation and discuss the major experimental results in Section 5. Finally we conclude with a summary and some future research directions in Section 6.

Section snippets

Aspect-level sentiment analysis

Opinions can be expressed about anything, e.g., a topic, a product, a service, an individual, an event, an organization or any attributes of them. Hence we use the notation of aspect to denote the target object that has been evaluated. An opinion (as expressed by means of opinion words) is a positive or negative sentiment, attitude, emotion or appraisal about an aspect. Positive and negative are called sentiment or opinion orientations [10], [6]. In general there are two types of reviews:

Related works

Several methods have been proposed, mainly in the context of product review mining in a broad range of study fields, from document to aspect level sentiment analysis for standard, ironic or spam reviews [3], [7], [8], [6], [12], [16], [21], [27], [33], [18], [19]. In the review mining task, aspects usually refer to opinion targets and product features, which are defined as product components or attributes. Existing aspect and product feature extraction techniques use both supervised and

Aspect detection model for sentiment analysis

Fig. 3 gives the architectural overview of the proposed model used for detecting explicit and implicit aspects in sentiment analysis. The basic hypotheses in this model are about using frequency-based and inter-relation information of the aspects together, employing the influence of an opinion word in the review sentence and giving more importance to multi-word aspects. This model proves using these hypotheses all together attain to highly effective results for product aspect extraction.

The

Experimental results

In this section we discuss the experimental results for the proposed model and presented algorithms. To report the effectiveness of our model first we evaluate the results for each individual step in of our model, and then we compare the results with the benchmarked results by Wei et al. [27] and Somprasertsri and Lalitrojwong’s [21]. Finally we discuss about identification of implicit aspects. In the following, data collection, evaluation measures and important evaluation results will be

Conclusions

In this research we study sentiment analysis and opinion mining for online reviews. When dealing with mining online reviews, it is often expensive and time consuming to construct labeled data for training purposes and it is desirable to develop a model or algorithm that can do without labeled data. In this paper we therefore proposed an unsupervised domain- and language-independent model for detecting explicit and implicit aspects from the reviews. The proposed model is able to deal with three

Acknowledgments

The authors thank Dr. Djoerd Hiemstra for his invaluable comments and suggestions and gratefully acknowledge the hospitality offered to the first author by the Human Media Interaction (HMI) group at the University of Twente. The research of the last author of this paper is partially supported by the Dutch National FES Program COMMIT.

References (35)

A. Balahur et al.
Detecting implicit expressions of emotion in text: a comparative analysis
Decision Support Systems
(2012)
A. Reyes et al.
From humor recognition to irony detection: the figurative language of social media
Data & Knowledge Engineering
(2012)
A. Reyes et al.
Making objective decisions from subjective data: detecting irony in customers reviews
Decision Support Systems
(2012)
C. Bosco et al.
Developing corpora for sentiment analysis and opinion mining: the case of irony and Senti-TUT
IEEE Intelligent Systems
(2013)
S. Brody et al.
An unsupervised aspect-sentiment model for online reviews
T. Dunning
Accurate methods for the statistics of surprise and coincidence
Computational Linguistics
(1993)
X. Fu et al.
Multi-aspect sentiment analysis for Chinese online social reviews based on topic modeling and HowNet lexicon
Knowledge-Based Systems
(2013)
A. Hogenboom et al.
A statistical approach to star rating classification of sentiment
Management Intelligent Systems
(2012)
M. Hu et al.
Mining opinion features in customer reviews
M. Hu et al.
Mining and summarizing customer reviews

B. Liu et al.

Opinion observer: analyzing and comparing opinions on the web

B. Liu et al.

A survey of opinion mining and sentiment analysis

Mining Text Data

(2012)

C. Lin et al.

Weakly supervised joint sentiment-topic detection from text

IEEE Transactions on Knowledge and Data Engineering

(2012)

G. Mangnoesing, A. van Bunningen, A. Hogenboom, F. Hogenboom, F. Frasincar, An empirical study for determining relevant...

M.P. Marcus et al.

Building a large annotated corpus of English: the Penn Treebank

Computational Linguistics

(1993)

S. Moghaddam et al.

ILDA: interdependent LDA model for learning latent aspects and their ratings from online product reviews

H. Nakagawa et al.

Automatic term recognition based on statistics of compound nouns and their components

Terminology

(2003)

Cited by (119)

Cross-Domain Aspect Detection and Categorization using Machine Learning for Aspect-based Opinion Mining
2022, International Journal of Information Management Data Insights
Citation Excerpt :
They used lexicon for finding sentiment polarity. Bagheri, Mohamad and Franciska (Bagheri, Mohamad and Franciska 2013) proposed Aspect Detection Model based on LDA (ADMLDA). This model is based on Markov Chain and doesn't consider bag of words.
There is an increase in the development of social media and electronic commerce sites day by day. In order to express their opinions about the products purchased user's write comments, messages and reviews. The reviews present in the e-commerce sites are also increasing. Users find difficulty in getting appropriate information about the right topic from this large data. Aspect-based Opinion Mining (ABOM) helps users in this regard. In many real-world applications ABOM is used to get the details about the aspects of entities, where the opinion is expressed for those aspects and entities. One of the key elements of ABOM is Aspect extraction. Unsupervised Machine Learning approach has been used to extract aspects from the reviews as it does not require pre-labelled data. In this regard Latent Dirichlet Allocation (LDA) and Latent Semantic Analysis (LSA) are two most commonly used unsupervised Topic Modeling approaches. The topics are extracted from three different datasets such as Amazon Mobile Reviews, Hotel Reviews and IMDb Movie Reviews using LDA and LSA algorithms. These extracted topics are aspects of our interest. The results of topic modeling algorithms are quite difficult to be interpreted by the common user. The different visualization methods are used to display the results of topic modeling algorithms in an interactive way. Two different multi-class classifiers such as Multinomial Naive Bayes (MNB) and Support Vector Machine (SVM) have been constructed for aspect categorization. These classifiers are evaluated by considering the evaluation measures such as Precision, Recall and F1 score. As a result, SVM classifier has good performance than MNB classifier for aspect categorization task of aspect-based opinion mining.
The impact of COVID-19 on tourism: Analysis of online reviews in the airlines sector
2022, Journal of Air Transport Management
This research aimed to understand how airline companies are addressing the crisis generated by the Covid-19 pandemic and handling issues like cancellations and customer (dis)satisfaction. Research on online reviews from the most popular tourism website, TripAdvisor, was conducted through the collection of review posts from the leading 10 worldwide airline groups by number of passengers. These reviews were extracted from the sector's most impacted period during the pandemic – from the date where the first travel restrictions were imposed until the date where they began to be lifted again (from March to May 2020), which consequently led to a greater number of posted and shared reviews. A total of 885 reviews were collected and analysed with the help of the Python-based sentiment analysis tool VADER.
Results showed a very negative trend, which was mainly caused by issues related to refund policies and process, confirming the reported pandemic impact on this sector. Low-cost airlines revealed a lower customer satisfaction rate when compared to traditional ones, while most of the posts were related to Loyalty/Competitiveness, which affected brands' overall equity. This study enables to better understand, from the customers' perspective, how airlines were able to deal with the severe impact of the COVID-19 pandemic. Through such knowledge and subsequent critical discussion, we unveil the critical issues that have led to unsatisfied customers, helping to build up the body of knowledge on airlines’ recovery after the pandemic.
Value co-creation and co-destruction in service ecosystems: The case of the Reach Now app
2021, Technological Forecasting and Social Change
In recent years, a change in business logic from goods-dominant (G-D) to service-dominant (S-D) logic can be observed widely. For instance, in the case of the mobility sector, companies such as Daimler AG and the BMW Group are shifting from solely producing cars to also providing mobility services. One fruit of their efforts is the Reach Now app, which supports users by combining multiple mobility services. Although such an app can contribute significantly to achieving smart mobility and thereby making the use of the private car less predominant, only a relatively small number of people use it. In this article, we adopt the S-D logic perspective to analyze the link between value formation (i.e., value co-creation and co-destruction) in customer-to-business relationships and business-to-business relationships in the service ecosystem of the Reach Now app based on an analysis of customer reviews of the Reach Now app in the Android Google Play Store between 2016 and 2019. We complement this analysis with interviews with representatives from six German public transport organizations and the Moovel Group GmbH, the app provider. Based on our analysis, we develop an interactional phase-based perspective on value formations in the tripartite relationship between app users, the Moovel Group GmbH, and public transport organizations. Our work complements previous S-D logic studies that (1) do not focus on information technology-enabled value formation, (2) neglect the concept of value co-destruction, (3) analyze only single dyadic actor-to-actor relationships, and/or (4) examine an established service ecosystem.
Using a hybrid content-based and behaviour-based featuring approach in a parallel environment to detect fake reviews
2021, Electronic Commerce Research and Applications
The financial impact of positive reviews has prompted some fraudulent sellers to generate fake product reviews for either promoting their products or discrediting competing products. Many e-commerce portals have implemented measures to detect such fake reviews, and these measures require excellent detectors to be effective. In this work, we propose 133 unique features from the combination of content and behaviour-based features to detect fake reviews using machine learning classifiers. Preliminary results show that these features can provide good results for all datasets tested. Detailed analysis of the results, however, reveals the existence of class imbalance issues for two of the bigger datasets - there is a high imbalance between the accuracies of different classes (e.g., 7.73% for the fake class and 99.3% for the genuine class using a Multilayer Perceptron classifier). We therefore introduce two sampling methods that can improve the accuracy of the fake review class on balanced datasets. The accuracies can be improved to a maximum of 89% for both random under and over-sampling on Convolutional Neural Networks. Additionally, we propose a parallel cross-validation method that can speed up the validation process in a parallel environment.
Intelligent product redesign strategy with ontology-based fine-grained sentiment analysis
2021, Artificial Intelligence for Engineering Design, Analysis and Manufacturing: AIEDAM
Weakly Supervised Learning Approach for Implicit Aspect Extraction †
2023, Information (Switzerland)

View all citing articles on Scopus

View full text

Care more about customers: Unsupervised domain-independent aspect detection for sentiment analysis of customer reviews

Highlights

Abstract

Introduction

Section snippets

Aspect-level sentiment analysis

Related works

Aspect detection model for sentiment analysis

Experimental results

Conclusions

Acknowledgments

Decision Support Systems

Data & Knowledge Engineering

Decision Support Systems

Developing corpora for sentiment analysis and opinion mining: the case of irony and Senti-TUT

IEEE Intelligent Systems

An unsupervised aspect-sentiment model for online reviews

Accurate methods for the statistics of surprise and coincidence

Computational Linguistics

Multi-aspect sentiment analysis for Chinese online social reviews based on topic modeling and HowNet lexicon

Knowledge-Based Systems

A statistical approach to star rating classification of sentiment

Management Intelligent Systems

Mining opinion features in customer reviews

Mining and summarizing customer reviews

Opinion observer: analyzing and comparing opinions on the web

A survey of opinion mining and sentiment analysis

Mining Text Data

Weakly supervised joint sentiment-topic detection from text

IEEE Transactions on Knowledge and Data Engineering

Building a large annotated corpus of English: the Penn Treebank

Computational Linguistics

ILDA: interdependent LDA model for learning latent aspects and their ratings from online product reviews

Automatic term recognition based on statistics of compound nouns and their components

Terminology