A Hybrid Approach for Aspect-Based Sentiment Analysis Using a Lexicalized Domain Ontology and Attentional Neural Models

This work focuses on sentence-level aspect-based sentiment analysis for restaurant reviews. A two-stage sentiment analysis algorithm is proposed. In this method, first a lexicalized domain ontology is used to predict the sentiment and as a back-up algorithm a neural network with a rotatory attention mechanism (LCR-Rot) is utilized. Furthermore, two features are added to the backup algorithm. The first extension changes the order in which the rotatory attention mechanism operates (LCR-Rot-inv). The second extension runs over the rotatory attention mechanism for multiple iterations (LCR-Rot-hop). Using the SemEval-2015 and SemEval-2016 data, we conclude that the two-stage method outperforms the baseline methods, albeit with a small percentage. Moreover, we find that the method where we iterate multiple times over a rotatory attention mechanism has the best performance.

1. Target Extraction: extracting the target word or set of words present in text 2. Aspect Detection: detecting the aspects from text (aspects can be broader than targets) Explicit Aspect Detection: aspects have associated targets in text Implicit Aspect Detection: aspects do not have associated targets in text (they are implied from text) Hybrid: mix knowledge-based reasoning with machine learning, e.g., BoW+Ont (Schouten and Frasincar, 2018) Two-Step Hybrid: first knowledge-based reasoning and then machine learning as backup, e.g., Ont+BoW (Schouten and Frasincar, 2018), which uses an SVM classifier as backup We propose a two-step hybrid approach that performs first Knowledge-Based Reasoning and then Deep Learning as backup dubbed A Hybrid Approach for Aspect-Based Sentiment Analysis (HAABSA) We reuse Knowledge-Based Reasoning from Ont+BoW, but extend LCR-Rot for Deep Learning: Inverting the attention order in the rotary attention mechanism Performing multiple hops in the rotary attention mechanism

Methodology
The methodology is based on two main steps: 1. Knowledge-Based Reasoning

Deep Learning
More precisely: 1.1 Compute the sentiment of each word in a sentence that is related to the current aspect based on a lexicalized domain sentiment ontology 1.2 If only positive sentiment found then classify the aspect as positive 1.3 If only negative sentiment found then classify the aspect as negative 2 If both positive and negative sentiment found or no sentiment found then apply the Neural Attention Model to classify the aspect as positive, negative, or neutral

Ontology
The onology has three main classes: SentimentMention: specifies the mentions of sentiment SentimentValue: specifies the polarity which can be Positive or Negative AspectMention: specifies the mentions of aspects and two main (annotation) properties: lex: relates a mention to a lexical representation aspect: relates a sentiment mention to an aspect In addition: The Neutral sentiment is not specified due to its ambiguous semantics We consider negation in two cases: Using the dependency relation the current word is related to a Negator One of the preceding three words with respect to the current word is a Negator

Sentiment Mention Types
There are three sentiment mention types: Type 1 : generic sentiment mention, which has the same sentiment value for all aspects e.g., "awesome" is always Positive (unless sarcasm present, which we do not consider here) Type 2 : aspect-dependent sentiment, which has the same sentiment value for some aspects (extra check for matching the current aspect needed) e.g., "delicious" is Positive for SustenanceMention (food and drinks) but does not apply to ServiceMention Type 3 : context-dependent sentiment, which has different sentiment values for different aspects (extra check for matching the current aspect needed) New axioms built based on a sentiment mention linked by a dependency relation to the current aspect e.g., "cold" + "beer" is Positive but "cold" + "pizza" is Negative , where M is the length of the target phrase (the relevant aspect) Right context: S r = [s r 1 , s r 2 , . . . , s r R ], where R is the length of the right context The corresponding word embeddings are: 1 , e r 2 , . . . , e r R ] where e n ∈ R d and d is the dimension of the word embedding (usually d = 300) We apply three bi-directional long-short-term-memory (Bi-LSTM) modules (with 300 hidden units each) on the previous word embeddings: Left Bi-LSTM: computes the left context hidden values

Bi-LSTM Advantages
Remember relevant information for a long period of time

Target2Context Attention Mechanism
Computes the target-aware left context representation as follows: Determine a target representation using an average pooling layer: Determine the left context word attention values using a bilinear form involving hidden values h l i , for i = 1, . . . , L, and r tp : where W l c is a weight matrix and b l c is a bias Squash the attention values to values between 0 and 1 using the softmax function: Determine the target-aware left context representation as an attention weighted average of word hidden values: By following Equations (2)-(4) in a similar way we can obtain r r for the target-aware right context representation Computes an improved target representation as follows: Determine the target word attention values using a bilinear form involving hidden values h t i , for i = 1, . . . , M, and r l : where W l t is a weight matrix and b l t is a bias Squash the attention values to values between 0 and 1 using the softmax function:

Context2Target Attention Mechanism
Determine the left context-aware target representation as an attention weighted average of word hidden values: By following Equations (5)- (7) in a similar way we can obtain the right context-aware target representation, r tr The sentence representation is obtained by concatenating the target-aware left context representation r l , target-aware right context representation r r , and the two sides context-aware target representations, r t l and r tr : The sentence representation vector is converted by a linear layer to a vector of size C , where C is the number of different sentiment categories, and the obtained vector is then fed into a softmax layer to predict the sentiment polarity of the target phrase: where p is a conditional probability distribution, W c is a weight matrix, and b c is a bias We minimize the cross-entropy loss function defined as: where y j is a vector that contains the true sentiment value for the j-th training opinion p j is a vector containing the predicted sentiment for the j-th training opinion λ is the weight of the L 2 -regularization term Θ is the parameter set which contains

For loss minimization we use backward error propagation
We intialize weight matrices by a uniform distribution U(−0.1, 0.1) and all bias are set to zero, as is done by Zheng and Xia (2018) To update the weights and biases we use stochastic gradient descent with momentum The dropout technique is applied to all hidden layers to avoid overfitting The following hyperparameters are tuned on 20% of the training data (80% of the training data is used for model building) using a Tree-structured Parzen Estimator (TPE) algorithm: the learning rate the L 2 -regularization term (λ) the dropout rate the momentum term Inverting the attention order in the rotary attention mechanism: 1. Apply context2target attention mechanism 2. Apply target2context attention mechanism We start with two context pooling layers: <s e n t e n c e i d=" 1 1 5 4 5 5 0 : 1 "> <t e x t>The p l a c e i s s m a l l and cramped b u t t h e f o o d i s f a n t a s t i c .</ t e x t> <O p i n i o n s> <O p i n i o n t a r g e t=" p l a c e " c a t e g o r y="AMBIENCE#GENERAL" p o l a r i t y=" n e g a t i v e " from=" 4 " t o=" 9 " /> <O p i n i o n t a r g e t=" f o o d " c a t e g o r y="FOOD#QUALITY" p o l a r i t y=" p o s i t i v e " from=" 39 " t o=" 43 " /> </ O p i n i o n s> </ s e n t e n c e> The most dominant aspect is "FOOD#QUALITY" (around 40%)

Data
Data Cleaning: All words are converted to lowercase and lemmatized Sentences containing implicit aspects (target="NULL") are not considered (as we need the targets) Sentences without Opinions are also excluded (no use) Word Emebeddings: GloVe word embedding vectors: 1.9 million vocabulary size with 300-dimensional vectors Words that do not appear in the GloVe vocabulary are randomly initialized by a normal distribution N(0, 0.05 2 ) as by Zheng and Xia (2018) (only 3.6% of the words are not in the GloVe vocabulary, these words are often names of restaurants, jargon, or slang) Training is performed on the training data and testing is done on the official test data The evaluation metric is classification accuracy (same as used in the SemEval competition) Reference models: Ont: knowledge-based reasoning with backup majority polarity (Schouten and Frasincar, 2018) BoW : bag-of-words method combined with an SVM classifier to determine sentiment (Schouten and Frasincar, 2018) CABASC : neural network that contains a context attention-based memory module (Liu et al., 2018) Ont+BoW : two-step approach where an ontology method is first used and as backup the bag-of-words method is used (Schouten and Frasincar, 2018) Ont+CABASC : two-step approach where an ontology method is first used and as backup the CABASC method is used (new baseline)

Future Work
Develop a learning algorithm for the lexicalized domain sentiment ontology (reuse work on ontology learning) Propose a solution to deal with implicit aspects (finding target proxies based on word similarity)